safety - Concepts

Explore concepts tagged with "safety"

Total concepts: 24

Concepts

Human-on-the-Loop - A supervisory model where humans monitor AI systems and can intervene when needed, but are not required to approve every individual action or decision.
AI Agent Permissions - Controls governing what actions, tools, files, and resources AI agents can access, enforcing the principle of least privilege in agentic AI systems.
AI Guardrails - Safety constraints and boundaries built into AI systems to prevent harmful or undesired outputs.
Bow-Tie Analysis - A risk analysis method that visually maps the pathways from causes through a hazardous event to consequences, showing preventive and mitigative barriers.
Microsleep - Brief, involuntary episodes of sleep lasting a few seconds that occur when a person is fatigued but trying to stay awake.
AI Oversight - The governance mechanisms, processes, and institutions designed to monitor, evaluate, and regulate AI systems throughout their lifecycle.
Barrier Analysis - A root cause analysis technique that examines what barriers should have prevented an incident and why they failed.
Automation Complacency - Reduced vigilance and monitoring when relying on automated systems, leading to failure to detect errors or malfunctions.
AI Safety - Research and practices ensuring AI systems are beneficial and don't cause unintended harm.
Jailbreaking AI - Techniques used to bypass an AI model's safety guardrails and restrictions to produce outputs it was designed to refuse.
Plan Continuation Bias - The tendency to continue with an original course of action even when changing circumstances suggest the plan should be revised or abandoned.
Guardrails - Safety constraints and boundaries that control AI system behavior, preventing harmful, undesired, or out-of-scope outputs and actions.
AI Data Security - Protecting sensitive data when using AI systems, where every interaction including prompts, uploaded files, tool call results, and agent memory is a potential data exposure point.
Constitutional AI - AI training method using a set of principles (constitution) to guide model behavior and self-improvement.
Human-in-the-Loop - Systems design where humans remain actively involved in AI decision-making processes.
Swiss Cheese Model - A model illustrating how accidents occur when holes in multiple layers of defense align, allowing a hazard to pass through all barriers.
Human-out-of-the-Loop - A fully autonomous model where AI systems operate independently without human oversight or intervention in real-time decision-making.
AI Alignment - Ensuring AI systems behave in accordance with human intentions and values.
Resilience Engineering - A discipline focused on understanding how systems succeed under varying conditions and building capacity to adapt to unexpected situations.
AI Red Teaming - Systematic adversarial testing of AI systems to discover vulnerabilities, biases, and failure modes before deployment.
Reward Hacking - A failure mode in reinforcement learning where an agent exploits flaws in the reward function to achieve high reward without fulfilling the intended objective.
Red Teaming - An adversarial testing practice where a dedicated team attempts to find vulnerabilities, flaws, or failure modes in a system by simulating attacks or misuse scenarios.
Responsible AI - A comprehensive framework for developing and deploying AI systems that are ethical, transparent, fair, accountable, safe, and beneficial to society.
AI Trust - The confidence users and stakeholders place in AI systems to perform reliably, safely, and in alignment with their expectations and values.

← Back to all concepts