Reinforcement Learning from Human Feedback (RLHF) - Graph View

A training technique that aligns LLM outputs with human preferences by using human feedback to guide model behavior.