safety - Concepts

Explore concepts tagged with "safety"

Total concepts: 12

Concepts

Human-in-the-Loop - Systems design where humans remain actively involved in AI decision-making processes.
Red Teaming - An adversarial testing practice where a dedicated team attempts to find vulnerabilities, flaws, or failure modes in a system by simulating attacks or misuse scenarios.
Barrier Analysis - A root cause analysis technique that examines what barriers should have prevented an incident and why they failed.
Constitutional AI - AI training method using a set of principles (constitution) to guide model behavior and self-improvement.
Automation Complacency - Reduced vigilance and monitoring when relying on automated systems, leading to failure to detect errors or malfunctions.
Resilience Engineering - A discipline focused on understanding how systems succeed under varying conditions and building capacity to adapt to unexpected situations.
Reward Hacking - A failure mode in reinforcement learning where an agent exploits flaws in the reward function to achieve high reward without fulfilling the intended objective.
Swiss Cheese Model - A model illustrating how accidents occur when holes in multiple layers of defense align, allowing a hazard to pass through all barriers.
AI Guardrails - Safety constraints and boundaries built into AI systems to prevent harmful or undesired outputs.
Bow-Tie Analysis - A risk analysis method that visually maps the pathways from causes through a hazardous event to consequences, showing preventive and mitigative barriers.
AI Alignment - Ensuring AI systems behave in accordance with human intentions and values.
AI Safety - Research and practices ensuring AI systems are beneficial and don't cause unintended harm.

← Back to all concepts