safety - Concepts
Explore concepts tagged with "safety"
Total concepts: 12
Concepts
- Human-in-the-Loop - Systems design where humans remain actively involved in AI decision-making processes.
- Red Teaming - An adversarial testing practice where a dedicated team attempts to find vulnerabilities, flaws, or failure modes in a system by simulating attacks or misuse scenarios.
- Barrier Analysis - A root cause analysis technique that examines what barriers should have prevented an incident and why they failed.
- Constitutional AI - AI training method using a set of principles (constitution) to guide model behavior and self-improvement.
- Automation Complacency - Reduced vigilance and monitoring when relying on automated systems, leading to failure to detect errors or malfunctions.
- Resilience Engineering - A discipline focused on understanding how systems succeed under varying conditions and building capacity to adapt to unexpected situations.
- Reward Hacking - A failure mode in reinforcement learning where an agent exploits flaws in the reward function to achieve high reward without fulfilling the intended objective.
- Swiss Cheese Model - A model illustrating how accidents occur when holes in multiple layers of defense align, allowing a hazard to pass through all barriers.
- AI Guardrails - Safety constraints and boundaries built into AI systems to prevent harmful or undesired outputs.
- Bow-Tie Analysis - A risk analysis method that visually maps the pathways from causes through a hazardous event to consequences, showing preventive and mitigative barriers.
- AI Alignment - Ensuring AI systems behave in accordance with human intentions and values.
- AI Safety - Research and practices ensuring AI systems are beneficial and don't cause unintended harm.
← Back to all concepts