alignment - Concepts
Explore concepts tagged with "alignment"
Total concepts: 11
Concepts
- Constitutional AI - AI training method using a set of principles (constitution) to guide model behavior and self-improvement.
- Shared Understanding - Common knowledge, perspectives, and mental models that enable effective team collaboration.
- Direct Preference Optimization - A simplified alternative to RLHF that fine-tunes language models directly on human preference data without training a separate reward model.
- Reinforcement Learning from Human Feedback (RLHF) - A training technique that aligns LLM outputs with human preferences by using human feedback to guide model behavior.
- Instruction Tuning - A fine-tuning technique that trains language models to follow natural language instructions by learning from examples of instruction-response pairs.
- Reward Model - A neural network trained to predict human preferences, used to provide a scalar reward signal for optimizing language model behavior in RLHF.
- Strategic Alignment - The process of ensuring that an organization's structure, resources, and activities are consistently directed toward achieving its mission and vision.
- Team Charter - A document defining a team's purpose, goals, roles, and operating principles.
- Reward Hacking - A failure mode in reinforcement learning where an agent exploits flaws in the reward function to achieve high reward without fulfilling the intended objective.
- Shared Vision - A collectively held picture of the future that members of a group genuinely want to create together, generating intrinsic commitment rather than mere compliance.
- Sales Enablement - The practice of equipping sales teams with the content, tools, knowledge, and processes they need to effectively engage buyers and close deals.
← Back to all concepts