Safety
AI safety
Broader field covering technical, organizational, and policy risks of AI systems — from mistakes and misuse to longer-term existential concerns.
AI safety is an interdisciplinary field concerned with the risks of AI systems — from concrete present-day failures and misuse to longer-term existential concerns about highly capable systems.
A typical risk taxonomy:
- Misuse: generation of deepfakes, disinformation, malware code, biochemical instructions; deliberate abuse
- Accidents: hallucinations, systems doing the wrong thing in good faith (autonomous cars, medical advice)
- Structural: concentration of power, labor-market impact, surveillance, state-level abuse
- Catastrophic / existential: future systems beyond human oversight
Practical tools include: capability and dangerous-behavior evaluations, red teaming, interpretability (mechanistic analysis of internal representations), production monitoring, access controls, model cards and system cards, responsible scaling policies (Anthropic), preparedness frameworks (OpenAI), and frontier safety frameworks (Google DeepMind).
Institutionally, 2024–2026 brought a wave of national AI safety institutes (UK, US, Singapore, EU, with Croatia indirectly via the EU AISB). The EU AI Act bakes some safety requirements into law — mandatory evaluations for GPAI with systemic risk, transparency obligations, human oversight on high-risk systems.
AI safety is often conflated with AI alignment — alignment is narrower (the model’s internal objectives), while safety encompasses the entire sociotechnical system around the AI.