Dataset Documentation

Embodied Failure-First Red-Teaming Data

Summary

This project provides non-operational red-teaming datasets for humanoid and embodied agents, focused on recursive failure and recovery rather than task success.

141,047+
Scenarios
4
Dataset Types
19
Domains
JSONL
Format

Intended Use

Contents

Single-Agent Scenarios

JSONL format with environment context, tone parameters, adversarial injectors, and proxy scores. Each scenario describes a specific failure pattern.

Multi-Agent Scenarios

Scenarios involving bystander/supervisor conflicts, where multiple human roles present conflicting instructions to an embodied agent.

Stateful Episodes

Multi-scene sequences (5–10 scenes) that test memory consistency, context drift, and recovery across extended interactions.

Intent Bait Set

Scenarios designed to test instruction-hierarchy subversion: format lock, refusal suppression, persona hijack, temporal laundering, and constraint erosion.

Out of Scope

Prohibited Use

These datasets must not be used to generate operational instructions for wrongdoing or as how-to guides for bypassing safety controls. All scenarios are pattern-level descriptions for defensive evaluation purposes.

Limitations

Safety Notes

Failure examples are high-level summaries, not actionable procedures. All datasets are validated against versioned JSON Schemas and safety-linted on every commit through CI.

Schema & Metadata

Schema Versions

Scenario Distribution by Domain

Citation

@misc{failurefirst2025dataset,
  title = {Failure-First Embodied AI Adversarial
          Scenario Dataset},
  author = {Wedd, Adrian},
  year = {2025},
  url = {https://github.com/adrianwedd/failure-first},
  note = {141,047+ scenarios, 661 failure classes,
         19 domains, JSONL format}
}

For more citation options and data access information, see the citation page.

Changelog

  • v0.2 (Jan 2026): Schema upgrade with intent labels, expanded from 10K to 18K+ scenarios, added multi-agent and episode formats
  • v0.1 (Sep 2025): Initial dataset release with single-agent scenarios across 5 domains