Reward Model - Graph View A neural network trained to predict human preferences, used to provide a scalar reward signal for optimizing language model behavior in RLHF. View concept details Related ConceptsReinforcement Learning from Human Feedback (RLHF) Reinforcement Learning Reward Hacking Direct Preference Optimization Constitutional AI AI Alignment Large Language Models (LLMs) Fine-Tuning ← Back to full graph