Failure is the
primary object
of study

258 models. 5 attack families. 140,794 adversarial results.

We study how AI systems fail, not just how they succeed.

Through adversarial testing across 258 models and 142,307 prompts spanning 5 attack families, we characterize how embodied AI systems break under pressure, how failures cascade across multi-agent environments, and what makes recovery possible. Our research informs policy, standards, and defensive architectures.

142,307

Adversarial Prompts

258

Models Evaluated

346+

Attack Techniques

Policy Reports

Start Here

Choose your path based on what you need:

Policymakers

Evidence-based briefs for AI safety regulation and standards

25 policy reports

Researchers

Datasets, methodology, and reproducible findings

142,307 prompts, 258 models

Industry

Benchmarks, red-teaming tools, and safety evaluation

Open-source tools

Core Research

Jailbreak Archaeology

Historical attack corpus across 6 eras (2022–2026), tested against 258 models. Revealed a 4x classifier overcount from keyword-based evaluation (Cohen's kappa = 0.126).

Key Dataset

Multi-Agent Attack Surface

Analysis of 1,497 AI agent interactions on Moltbook, an agent-only social network. Discovered environment shaping and narrative erosion as dominant attack vectors.

Active Research

Model Vulnerability Patterns

How model size, architecture, and training affect adversarial robustness. Medium-scale models may face elevated adversarial risk where capability outpaces safety investment.

Key Finding

Policy Corpus

26 policy reports and 160 total research reports synthesizing 100-200+ sources each. EU AI Act compliance, NIST frameworks, insurance requirements, and standards gaps.

Policy Briefs

All Research Studies →

Research Context

This is defensive AI safety research. All adversarial content is pattern-level description for testing, not operational instructions for exploitation. Similar to penetration testing in cybersecurity: we study vulnerabilities to build better defenses.

The Failure-First Philosophy

"Failure is not an edge case. It's the primary object of study."

Most AI safety work optimizes for capability and treats failure as an afterthought. We invert this: by understanding how systems fail, we can design better safeguards, recovery mechanisms, and human-in-the-loop interventions.

Read the Manifesto

Daily Paper

One AI safety paper per day, analyzed through the failure-first lens.

May 9 SoK: Robustness in Large Language Models against Jailbreak Attacks May 8 Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms May 7 MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety May 6 Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses May 5 Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks

All papers →

Latest from the Blog

May 13, 2026

Robot Dogs Are a Security Nightmare — And We Can Prove It

Eight CVEs. A wormable Bluetooth exploit. An encrypted backdoor sending data to Chinese servers. And police departments buying them anyway. A deep dive into the Unitree vulnerability landscape and what it means for embodied AI safety.

embodied-airoboticssecuritycveunitreebackdoorlaw-enforcementsurveillanceprocess-layer-attacks

May 13, 2026

AI Safety Daily — May 13, 2026

Fine-tuning asymmetry, KPI-induced constraint violations, tri-role self-play alignment, and a meta-prompting red-team framework converge on alignment as a dynamic property that erodes under optimization pressure.

ai-safety-dailyalignmentred-teamingagentic-aifine-tuning

May 12, 2026

AI Safety Daily — May 12, 2026

An embodied AI safety survey, actionable mechanistic interpretability, professional agent benchmarking, CoT attack vectors, and an integrated diagnostic toolkit collectively expose the same gap: evaluation infrastructure is maturing faster than remediation tooling.

ai-safety-dailyembodied-aiinterpretabilityagentic-aibenchmarking

All posts →

Work With Us

Our commercial services are grounded in this research. Every engagement draws on 142,307 adversarial prompts, 346+ attack techniques, and evaluation data across 258 models.

Quick Start

Clone the repository and validate datasets:

git clone https://github.com/adrianwedd/failure-first.git
cd failure-first
pip install -r requirements-dev.txt
make validate  # Schema validation
make lint      # Safety checks

View on GitHub Framework Guide

Failure is the
primary object
of study

Start Here

Policymakers

Researchers

Industry

Core Research

Jailbreak Archaeology

Multi-Agent Attack Surface

Model Vulnerability Patterns

Policy Corpus

The Failure-First Philosophy

Daily Paper

Latest from the Blog

Robot Dogs Are a Security Nightmare — And We Can Prove It

AI Safety Daily — May 13, 2026

AI Safety Daily — May 12, 2026

Work With Us

Red-Team Assessments

Safety Audits

Advisory

Intelligence Briefs

Quick Start

Failure is theprimary objectof study

Start Here

Policymakers

Researchers

Industry

Core Research

Jailbreak Archaeology

Multi-Agent Attack Surface

Model Vulnerability Patterns

Policy Corpus

The Failure-First Philosophy

Daily Paper

Latest from the Blog

Robot Dogs Are a Security Nightmare — And We Can Prove It

AI Safety Daily — May 13, 2026

AI Safety Daily — May 12, 2026

Work With Us

Red-Team Assessments

Safety Audits

Advisory

Intelligence Briefs

Quick Start

Failure is the
primary object
of study