Reinforcement Learning from Human Feedback (RLHF) - Graph View A training technique that aligns LLM outputs with human preferences by using human feedback to guide model behavior. View concept details Related ConceptsSemantic Ablation AI Alignment AI Safety Constitutional AI Human-in-the-Loop Large Language Models (LLMs) Reinforcement Learning Reward Model Reward Hacking Direct Preference Optimization Instruction Tuning Red Teaming ← Back to full graph