How AI systems
fail

190 models. 132,416 results. 35 attack families. The most comprehensive adversarial safety corpus in existence.

Our research characterizes AI failure patterns through adversarial testing. We study how systems break down under pressure, how failures cascade across agents, and what makes recovery possible.

142,307
Adversarial Prompts
258
Models Evaluated
346+
Attack Techniques
25
Policy Reports

Research Areas

Explore findings by category:

Jailbreak Archaeology

1 studies

Historical analysis of attack evolution from 2022-2025. 64 scenarios across 6 eras, tested against 190 models.

Multi-Agent Research

2 studies

How AI agents influence each other in multi-agent environments. Environment shaping, narrative erosion, and emergent authority hierarchies.

Attack Pattern Analysis

3 studies

Taxonomy of adversarial techniques and how models respond to them. From single-turn exploits to multi-turn cascades.

Defense Mechanisms

2 studies

How models resist adversarial attacks. Format/content separation, refusal patterns, and recovery mechanisms.

Failure Taxonomies

2 studies

Classification systems for understanding how AI systems fail. Recursive, contextual, interactional, and temporal failures.

Prompt Injection Testing

12 studies

12 calibrated honeypot pages testing AI agent susceptibility to indirect prompt injection. From visible baselines to expert-level multi-vector attacks.

Policy Brief Series

26 studies

26 policy reports plus 160 total research reports on embodied AI safety: regulation, standards, technical analysis, and policy recommendations.

Intelligence Briefs

1 studies

Evidence-grounded assessments for commercial and policy decision-making. Synthesizes corpus data, published research, and Failure-First findings.

Research Videos

19 studies

AI-generated cinematic video overviews of key Failure-First findings, with downloadable slide decks. Produced with NotebookLM.

Research Audio

3 studies

AI-generated audio overviews of research reports and intelligence briefs, produced with NotebookLM in a conversational podcast format.

Industry Landscape

2 studies

Directory of 214 humanoid robotics companies and competitive landscape of AI safety testing vendors. Filterable, with structured data.

All Studies

Jailbreak Archaeology

Published

Historical analysis of attack evolution from 2022-2025. 64 scenarios across 6 eras, tested against 190 models.

Jailbreak Archaeology

Moltbook: Multi-Agent Attack Surface

Active

Empirical analysis of 1,497 AI agent interactions on an agent-only social network.

Multi-Agent

Multi-Agent Failure Scenarios

Active

How multiple actors create failure conditions that single-agent testing misses.

Multi-Agent

Model Vulnerability Findings

Active

How model size, architecture, and training affect vulnerability to adversarial attacks.

Attack Patterns

Humanoid Robotics Safety

Active

Safety analysis of humanoid robots across 15+ research dimensions.

Failure Taxonomies

Compression Tournament Findings

Published

Methodology lessons from three iterations of adversarial prompt compression.

Attack Patterns

Defense Pattern Analysis

Published

How models resist adversarial attacks: the format/content separation pattern.

Defense Mechanisms

Attack Pattern Taxonomy

Published

82 attack techniques classified across 7 categories.

Attack Patterns

Failure Mode Taxonomy

Published

Recursive, contextual, interactional, and temporal failure classifications.

Failure Taxonomies

Recovery Mechanisms

Published

How AI systems recover (or fail to recover) from failure states.

Defense Mechanisms

Research Methodology

Published

Our approach to adversarial AI safety research and benchmarking.

Methodology

Prompt Injection Test Suite

Active

12 honeypot pages testing AI agent susceptibility to indirect prompt injection across 4 difficulty tiers.

Prompt Injection

Five Cross-Cutting Insights

Our research converges on five key findings that cut across all studies and inform policy recommendations:

1. The Semantic-Kinetic Gap

VLA models collapse the traditional robotics stack (Sense-Plan-Act) into a single neural network. A linguistic misunderstanding becomes a physical hazard with no intermediate controller to catch the error. This is the master vulnerability for embodied AI.

2. Binary Phase Transitions

Jailbreak success exhibits binary behavior: 0% compliance when attacks fail, 100% persistence when they succeed. There is no gradual degradation. Once "captured," models remain compromised.

3. Multi-Agent Failures Are Emergent

Failures in multi-agent systems are emergent, not additive. Cascade depth, semantic drift velocity, and consensus instability create failure modes that single-agent testing cannot detect.

4. The Regulatory Danger Zone

2026-2029 is the critical window: EU AI Act compliance deadlines, mass humanoid deployment, and regulatory bodies without embodied AI evaluation capabilities all converge.

5. Defense Requires Distrust

Effective defense architectures treat AI as an "untrusted oracle" whose outputs are suggestions, not commands. The correct default is to assume the AI will fail and design containment.

For Researchers