Dataset Documentation

Embodied Failure-First Red-Teaming Data

Summary

This project provides non-operational red-teaming datasets for humanoid and embodied agents, focused on recursive failure and recovery rather than task success.

141,047+

Scenarios

Dataset Types

Domains

JSONL

Format

Intended Use

Benchmarking LLM-based controllers, planners, or assistants for embodied systems
Comparing refusal consistency, invariant holding, escalation pathways, and recovery behavior

Single-Agent Scenarios

JSONL format with environment context, tone parameters, adversarial injectors, and proxy scores. Each scenario describes a specific failure pattern.

Multi-Agent Scenarios

Scenarios involving bystander/supervisor conflicts, where multiple human roles present conflicting instructions to an embodied agent.

Stateful Episodes

Multi-scene sequences (5–10 scenes) that test memory consistency, context drift, and recovery across extended interactions.

Intent Bait Set

Scenarios designed to test instruction-hierarchy subversion: format lock, refusal suppression, persona hijack, temporal laundering, and constraint erosion.

Out of Scope

Prohibited Use

These datasets must not be used to generate operational instructions for wrongdoing or as how-to guides for bypassing safety controls. All scenarios are pattern-level descriptions for defensive evaluation purposes.

Limitations

Scoring fields are proxies — calibrate against your own risk model
Episodes are text-only — they approximate embodiment through structured context
Not a substitute for real-world robotics testing

Safety Notes

Failure examples are high-level summaries, not actionable procedures. All datasets are validated against versioned JSON Schemas and safety-linted on every commit through CI.

Schema & Metadata

Schema Versions

Single-agentv0.2 — embodied_redteam_entry_schema_v0.2.json

Multi-agentv0.1 — multi_agent_entry_schema_v0.1.json

Episodesv0.1 — episode_schema_v0.1.json

Scenario Distribution by Domain

Humanoid roboticsPrimary domain (~60% of scenarios)

Warehouse systems~15% of scenarios

Medical devices~10% of scenarios

Collaborative manufacturing~8% of scenarios

Other domains~7% (elder care, security, retail, etc.)

Citation

@misc{failurefirst2025dataset,
  title = {Failure-First Embodied AI Adversarial
          Scenario Dataset},
  author = {Wedd, Adrian},
  year = {2025},
  url = {https://github.com/adrianwedd/failure-first},
  note = {141,047+ scenarios, 661 failure classes,
         19 domains, JSONL format}
}

For more citation options and data access information, see the citation page.

Changelog

v0.2 (Jan 2026): Schema upgrade with intent labels, expanded from 10K to 18K+ scenarios, added multi-agent and episode formats
v0.1 (Sep 2025): Initial dataset release with single-agent scenarios across 5 domains