Summary
Humanoid and embodied AI systems pose risks that cannot be mitigated by alignment alone. Safety must be defined in terms of how systems fail, recover, and allow human re-entry.
Current regulatory frameworks focus on what AI systems do correctly. This brief argues for complementary regulation focused on what happens when they fail.
Key Risks
Recursive Optimization
Embodied AI systems that optimize continuously can compound small errors into irreversible outcomes. Unlike software that can be patched, physical actions cannot be undone.
Authority Manipulation
Multi-agent environments create opportunities for authority confusion—where systems follow commands from sources they should not trust, or build social capital that grants illegitimate influence over other systems.
Irreversible Physical Actions
Cloud AI can be rolled back. A humanoid robot that has dropped an object, injured a person, or contaminated a workspace cannot undo its actions. Safety constraints must account for irreversibility.
Recommendations
1. Require Recovery Metrics, Not Just Success Metrics
Regulation should mandate measurement of how systems behave when they fail: time-to-halt, degradation predictability, and recovery quality. Success-only metrics create an incentive to hide failure.
2. Mandate Human Takeover Pathways
Every embodied AI system must provide a documented, tested pathway for immediate human takeover. This includes physical emergency stops, remote supervision channels, and degradation modes that preserve human oversight.
3. Audit Recursive Interaction Behavior
Systems must be tested across multi-turn interactions, not just single exchanges. Multi-turn testing reveals erosion, authority confusion, and temporal manipulation that single-turn evaluation misses entirely.
4. Prohibit Silent Degradation
Systems that degrade without observable signals are the most dangerous. Regulation should require that all degradation states be detectable, logged, and communicated to human operators.
Metrics to Require
These three metrics, measured across recursive interaction scenarios, provide a baseline for failure-first safety evaluation that complements existing alignment benchmarks.
Note
This brief summarizes research findings from the Failure-First project. It is not legal advice and does not represent any regulatory body's position.