Penetration Testing
for AI Systems

Infrastructure, model, pipeline — one engagement, one report

Most engagements force a choice: traditional pentest firm or AI safety lab. We run both layers in parallel and produce one cross-referenced report — so cross-layer attack chains (a leaked IAM credential enabling model exfiltration, a prompt injection escalating to infrastructure access) actually get found.

As AI systems expand the attack surface — agentic pipelines, model APIs, training supply chains, multi-agent orchestration — the boundary between "security" and "AI safety" has collapsed. Our methodology covers the whole surface.

Get a scoping call

Regulatory trigger — Australia and EU, 2026

The Australian Signals Directorate confirmed in May 2026 that Claude Mythos Preview is the first frontier model observed autonomously chaining individual cyber tasks into a complete end-to-end intrusion — executing a 32-step simulated corporate network attack without human guidance (UK AISI evaluation). Separately, Mozilla reported Mythos identified 271 vulnerabilities fixed in a single Firefox release, an order-of-magnitude increase over prior AI-assisted efforts.

ASD's key finding: open-weight models can already reproduce many Mythos techniques, and the assumption that adversaries lag frontier capabilities by months is no longer safe. Patch cycles and attack economics have both collapsed.

EU AI Act GPAI obligations begin 2 August 2026; high-risk system obligations 2 August 2027. Independent, reproducible pentest evidence — covering both the AI model layer and supporting infrastructure — is what regulators, auditors, and insurers now ask to see.

What We Test

A Failure-First engagement covers the full attack surface of an AI deployment: the infrastructure that runs it and the model layer itself. Findings from both layers appear in a single unified report, with cross-layer attack chains explicitly identified.

Traditional Layer

Web application security: OWASP Top 10, REST/GraphQL API endpoints, authentication flows
Cloud infrastructure & IAM: AWS/GCP/Azure misconfiguration, privilege escalation, exposed services
Supply chain & dependencies: Known CVEs in packages, container images, model artefacts
Secrets & credential exposure: Hardcoded keys, leaked tokens, environment variable leakage
Static application security: Code-level vulnerability patterns, unsafe deserialisation, injection sinks

AI Layer

LLM adversarial testing: Jailbreak taxonomy (81 techniques, 6 eras), prompt injection, refusal suppression
Agentic system testing: Tool misuse, chain exploitation, cross-agent injection, orchestration abuse
Alignment auditing: Deception, sycophancy, self-preservation, cooperation with harmful instructions
Multi-agent pipeline attacks: Inter-agent trust exploitation, context poisoning, state manipulation
AI supply chain: Model provenance, training data poisoning indicators, weight file integrity

Scope a parallel-layer engagement →

Why Failure-First

AI layer findings are grounded in the largest open adversarial dataset for embodied and agentic AI — not hypothetical scenarios.

142,307

Adversarial Prompts

258

Models Evaluated

346+

Attack Techniques

Policy Reports

Attack taxonomy validated across 100+ models spanning 6 research eras (2022–2025)
26 published research reports — methodology is public and peer-reviewable
FLIP (Failure-Level Impact Protocol) grading with documented inter-rater reliability
F1 Pipeline: proprietary corpus of 81 attack techniques run against your specific model endpoint

How an Engagement Works

Standard and Ongoing tiers follow the four-phase process below. Quick Scan compresses Phases 1–3 into 5–7 business days with a reduced scenario count. All tiers require a signed Authorisation to Test (ATT) before any active scanning begins.

Week 1

Scoping & Threat Modelling

System architecture and deployment context review
Attack surface mapping — infrastructure and AI layer
Regulatory framework identification (VAISS, EU AI Act, NIST AI RMF, ISO 42001)
Signed Authorisation to Test (ATT) and rules of engagement
Selection of attack scenarios from validated taxonomy

Weeks 2–3

Parallel Testing Execution

Traditional layer: web application, cloud/IAM, supply chain, secrets, SAST
AI layer: LLM adversarial, agentic pipeline, alignment auditing
Evidence capture and per-finding documentation throughout
Critical/High findings flagged to client same-day

Week 4

FLIP Grading, Analysis & Report

All findings graded with FLIP (Failure-Level Impact Protocol)
Cross-layer attack chain analysis — where infra findings compound AI findings
Compliance mapping to applicable regulatory frameworks
Core Technical Report, Compliance Mapping Report, Evidence Archive delivered
Debrief call — findings walkthrough and remediation Q&A

Tools & Infrastructure

All third-party tools are open-source. Our own pipeline is documented in published research and reproducible from the public repository. Every finding references the tool, version, and configuration used — there are no black-box scanners in the stack.

Traditional Layer

OWASP ZAP	Web application scanning — OWASP Top 10, active/passive scan
Nuclei	CVE and misconfiguration scanning — 10,000+ community templates
Gitleaks	Secrets and credential exposure across git history and working tree
Prowler	Cloud security posture — AWS, GCP, Azure IAM and configuration
Semgrep	Static application security testing — injection sinks, unsafe patterns
OSV-Scanner + Grype	Supply chain: known CVEs in packages, containers, and model artefacts

AI Layer

F1 Pipeline	Failure-First corpus — 81 attack techniques, 6 eras, graded against 100+ models; run against your specific endpoint
garak	LLM adversarial testing — jailbreak, prompt injection, data extraction
AgentDojo	Agentic system testing — tool misuse, injection, orchestration exploits
Petri (inspect-petri)	Alignment auditing — 38 judge dimensions across 173+ seed instructions
promptfoo	LLM red-teaming framework — configurable attack strategies, provider-agnostic
HarmBench	Standardised harmful content evaluation against academic benchmark behaviours
StrongREJECT	Refusal quality grading — distinguishes genuine refusals from over-refusal
agentic-radar	Agentic surface analysis — framework detection, trust boundary mapping

Additional adapters available on request: PyRIT (Microsoft), DeepTeam.

What You Receive

All engagements begin with a signed Authorisation to Test (ATT) and mutually agreed rules of engagement. Engagements are covered by professional indemnity insurance. A mutual NDA is standard.

Core Technical Report — all findings by severity (Critical / High / Medium / Low / Informational), tool attribution, reproduction steps, and evidence paths. FLIP-graded throughout.
Compliance Mapping Report (Standard and Ongoing tiers) — findings mapped to EU AI Act Articles 9 and 15, Australia's Voluntary AI Safety Standard Guardrail 4, NIST AI RMF MEASURE 2.6/2.7, ISO/IEC 42001 A.6.2.4. Each finding carries a compliance status: satisfied / partial / gap.
Executive Summary — 2–4 page board/audit committee narrative with risk posture, key findings, and recommended actions. Suitable for regulatory submissions and insurer reporting.
Evidence Archive — raw tool output, grading transcripts, FLIP verdicts, and methodology references. Packaged for auditor inspection or notified body review.
Remediation Roadmap — severity-prioritised fix list with acceptance criteria, enabling structured re-test after remediation.
Coordinated Disclosure Agreement — vulnerabilities reported to you first with mutually agreed timelines before any public disclosure. Findings are not added to the public research corpus without explicit client consent.

Engagement Tiers

Three structured tiers mapped to deployment stage and regulatory need. All tiers cover both the traditional and AI layers — the difference is depth and scope.

Tier 1

Quick Scan

Traditional layer: web app, secrets, top-3 cloud findings
AI layer: top-5 attack families against your model endpoint
FLIP-graded vulnerability profile
Executive summary with corpus baseline comparison
Compressed methodology: 5–7 business days

Best for: Pre-deployment sanity check, model selection, internal risk committees, VAISS spot check

Discuss Quick Scan →

Tier 2

Standard

Full capability matrix — all traditional tools + full AI layer suite
Cross-layer attack chain analysis
Compliance Mapping Report (EU AI Act Art 9/15, VAISS, NIST AI RMF, ISO 42001)
Remediation roadmap
4-week delivery

Best for: EU AI Act GPAI compliance (August 2026), VAISS full assessment, regulatory submissions, pre-Series B security diligence

Discuss Standard →

Tier 3

Ongoing

Monthly adversarial probe — traditional and AI layers
New technique coverage as threats emerge
GLI regulatory monitoring for your jurisdiction
Quarterly threat landscape brief
48-hour incident response for disclosed AI vulnerabilities

Best for: Deployed systems, fleet operators, continuous compliance obligations, insurers requiring periodic testing evidence

Discuss Ongoing →

Common Questions

Do you need production system access?: No. We test against staging or dedicated test endpoints. Where production access is required by scope, this is agreed explicitly in the Authorisation to Test.
How do you handle model weights and proprietary data?: All client artefacts remain within the agreed evidence boundary. We do not retain, transmit, or train on client model weights or data. A mutual NDA is standard.
Can you work under our existing pentest MSA?: Yes. We accept standard master security services agreements with reasonable modifications.
Will findings appear in your public research corpus?: No, without explicit written consent. Engagement findings are confidential by default under the coordinated disclosure agreement.
What insurance do you carry?: Professional indemnity and public liability. Coverage details provided at scoping.
Do you test third-party-hosted SaaS or model APIs you do not operate?: Only with written authorisation from the relevant third party. This is addressed during scoping.

Always out of scope: denial-of-service attacks, social engineering of client staff, physical access testing, third-party infrastructure without written authorisation from that party.

Get Started

Discovery calls are free. We scope engagements based on your deployment timeline, risk profile, and regulatory obligations. Typical scoping takes 5 business days.

Email services@failurefirst.org

Alternative: Contact form

Penetration Testingfor AI Systems

What We Test

Traditional Layer

AI Layer

Why Failure-First

How an Engagement Works

Scoping & Threat Modelling

Parallel Testing Execution

FLIP Grading, Analysis & Report

Tools & Infrastructure

Traditional Layer

AI Layer

What You Receive

Engagement Tiers

Quick Scan

Standard

Ongoing

Common Questions

Get Started

Penetration Testing
for AI Systems