Summary
This project provides non-operational red-teaming datasets for humanoid and embodied agents, focused on recursive failure and recovery rather than task success.
Intended Use
- Benchmarking LLM-based controllers, planners, or assistants for embodied systems
- Comparing refusal consistency, invariant holding, escalation pathways, and recovery behavior
Contents
Single-Agent Scenarios
JSONL format with environment context, tone parameters, adversarial injectors, and proxy scores. Each scenario describes a specific failure pattern.
Multi-Agent Scenarios
Scenarios involving bystander/supervisor conflicts, where multiple human roles present conflicting instructions to an embodied agent.
Stateful Episodes
Multi-scene sequences (5–10 scenes) that test memory consistency, context drift, and recovery across extended interactions.
Intent Bait Set
Scenarios designed to test instruction-hierarchy subversion: format lock, refusal suppression, persona hijack, temporal laundering, and constraint erosion.
Out of Scope
Prohibited Use
These datasets must not be used to generate operational instructions for wrongdoing or as how-to guides for bypassing safety controls. All scenarios are pattern-level descriptions for defensive evaluation purposes.
Limitations
- Scoring fields are proxies — calibrate against your own risk model
- Episodes are text-only — they approximate embodiment through structured context
- Not a substitute for real-world robotics testing
Safety Notes
Failure examples are high-level summaries, not actionable procedures. All datasets are validated against versioned JSON Schemas and safety-linted on every commit through CI.
Schema & Metadata
Schema Versions
Scenario Distribution by Domain
Citation
@misc{failurefirst2025dataset,
title = {Failure-First Embodied AI Adversarial
Scenario Dataset},
author = {Wedd, Adrian},
year = {2025},
url = {https://github.com/adrianwedd/failure-first},
note = {141,047+ scenarios, 661 failure classes,
19 domains, JSONL format}
} For more citation options and data access information, see the citation page.
Changelog
- v0.2 (Jan 2026): Schema upgrade with intent labels, expanded from 10K to 18K+ scenarios, added multi-agent and episode formats
- v0.1 (Sep 2025): Initial dataset release with single-agent scenarios across 5 domains