Penetration Testing
for AI Systems

Infrastructure, model, pipeline — one engagement, one report

Most engagements force a choice: traditional pentest firm or AI safety lab. We run both layers in parallel and produce one cross-referenced report — so cross-layer attack chains (a leaked IAM credential enabling model exfiltration, a prompt injection escalating to infrastructure access) actually get found.

As AI systems expand the attack surface — agentic pipelines, model APIs, training supply chains, multi-agent orchestration — the boundary between "security" and "AI safety" has collapsed. Our methodology covers the whole surface.

Regulatory trigger — Australia and EU, 2026

The Australian Signals Directorate confirmed in May 2026 that Claude Mythos Preview is the first frontier model observed autonomously chaining individual cyber tasks into a complete end-to-end intrusion — executing a 32-step simulated corporate network attack without human guidance (UK AISI evaluation). Separately, Mozilla reported Mythos identified 271 vulnerabilities fixed in a single Firefox release, an order-of-magnitude increase over prior AI-assisted efforts.

ASD's key finding: open-weight models can already reproduce many Mythos techniques, and the assumption that adversaries lag frontier capabilities by months is no longer safe. Patch cycles and attack economics have both collapsed.

EU AI Act GPAI obligations begin 2 August 2026; high-risk system obligations 2 August 2027. Independent, reproducible pentest evidence — covering both the AI model layer and supporting infrastructure — is what regulators, auditors, and insurers now ask to see.

What We Test

A Failure-First engagement covers the full attack surface of an AI deployment: the infrastructure that runs it and the model layer itself. Findings from both layers appear in a single unified report, with cross-layer attack chains explicitly identified.

Traditional Layer

Web application security
OWASP Top 10, REST/GraphQL API endpoints, authentication flows
Cloud infrastructure & IAM
AWS/GCP/Azure misconfiguration, privilege escalation, exposed services
Supply chain & dependencies
Known CVEs in packages, container images, model artefacts
Secrets & credential exposure
Hardcoded keys, leaked tokens, environment variable leakage
Static application security
Code-level vulnerability patterns, unsafe deserialisation, injection sinks

AI Layer

LLM adversarial testing
Jailbreak taxonomy (81 techniques, 6 eras), prompt injection, refusal suppression
Agentic system testing
Tool misuse, chain exploitation, cross-agent injection, orchestration abuse
Alignment auditing
Deception, sycophancy, self-preservation, cooperation with harmful instructions
Multi-agent pipeline attacks
Inter-agent trust exploitation, context poisoning, state manipulation
AI supply chain
Model provenance, training data poisoning indicators, weight file integrity

Why Failure-First

AI layer findings are grounded in the largest open adversarial dataset for embodied and agentic AI — not hypothetical scenarios.

142,307
Adversarial Prompts
258
Models Evaluated
346+
Attack Techniques
25
Policy Reports

How an Engagement Works

Standard and Ongoing tiers follow the four-phase process below. Quick Scan compresses Phases 1–3 into 5–7 business days with a reduced scenario count. All tiers require a signed Authorisation to Test (ATT) before any active scanning begins.

1
Week 1

Scoping & Threat Modelling

  • System architecture and deployment context review
  • Attack surface mapping — infrastructure and AI layer
  • Regulatory framework identification (VAISS, EU AI Act, NIST AI RMF, ISO 42001)
  • Signed Authorisation to Test (ATT) and rules of engagement
  • Selection of attack scenarios from validated taxonomy
2
Weeks 2–3

Parallel Testing Execution

  • Traditional layer: web application, cloud/IAM, supply chain, secrets, SAST
  • AI layer: LLM adversarial, agentic pipeline, alignment auditing
  • Evidence capture and per-finding documentation throughout
  • Critical/High findings flagged to client same-day
3
Week 4

FLIP Grading, Analysis & Report

  • All findings graded with FLIP (Failure-Level Impact Protocol)
  • Cross-layer attack chain analysis — where infra findings compound AI findings
  • Compliance mapping to applicable regulatory frameworks
  • Core Technical Report, Compliance Mapping Report, Evidence Archive delivered
  • Debrief call — findings walkthrough and remediation Q&A

Tools & Infrastructure

All third-party tools are open-source. Our own pipeline is documented in published research and reproducible from the public repository. Every finding references the tool, version, and configuration used — there are no black-box scanners in the stack.

Traditional Layer

OWASP ZAP Web application scanning — OWASP Top 10, active/passive scan
Nuclei CVE and misconfiguration scanning — 10,000+ community templates
Gitleaks Secrets and credential exposure across git history and working tree
Prowler Cloud security posture — AWS, GCP, Azure IAM and configuration
Semgrep Static application security testing — injection sinks, unsafe patterns
OSV-Scanner + Grype Supply chain: known CVEs in packages, containers, and model artefacts

AI Layer

F1 Pipeline Failure-First corpus — 81 attack techniques, 6 eras, graded against 100+ models; run against your specific endpoint
garak LLM adversarial testing — jailbreak, prompt injection, data extraction
AgentDojo Agentic system testing — tool misuse, injection, orchestration exploits
Petri (inspect-petri) Alignment auditing — 38 judge dimensions across 173+ seed instructions
promptfoo LLM red-teaming framework — configurable attack strategies, provider-agnostic
HarmBench Standardised harmful content evaluation against academic benchmark behaviours
StrongREJECT Refusal quality grading — distinguishes genuine refusals from over-refusal
agentic-radar Agentic surface analysis — framework detection, trust boundary mapping

Additional adapters available on request: PyRIT (Microsoft), DeepTeam.

What You Receive

All engagements begin with a signed Authorisation to Test (ATT) and mutually agreed rules of engagement. Engagements are covered by professional indemnity insurance. A mutual NDA is standard.

Engagement Tiers

Three structured tiers mapped to deployment stage and regulatory need. All tiers cover both the traditional and AI layers — the difference is depth and scope.

Tier 1

Quick Scan

Contact us for pricing
  • Traditional layer: web app, secrets, top-3 cloud findings
  • AI layer: top-5 attack families against your model endpoint
  • FLIP-graded vulnerability profile
  • Executive summary with corpus baseline comparison
  • Compressed methodology: 5–7 business days

Best for: Pre-deployment sanity check, model selection, internal risk committees, VAISS spot check

Tier 3

Ongoing

Contact us for pricing
  • Monthly adversarial probe — traditional and AI layers
  • New technique coverage as threats emerge
  • GLI regulatory monitoring for your jurisdiction
  • Quarterly threat landscape brief
  • 48-hour incident response for disclosed AI vulnerabilities

Best for: Deployed systems, fleet operators, continuous compliance obligations, insurers requiring periodic testing evidence

Common Questions

Do you need production system access?
No. We test against staging or dedicated test endpoints. Where production access is required by scope, this is agreed explicitly in the Authorisation to Test.
How do you handle model weights and proprietary data?
All client artefacts remain within the agreed evidence boundary. We do not retain, transmit, or train on client model weights or data. A mutual NDA is standard.
Can you work under our existing pentest MSA?
Yes. We accept standard master security services agreements with reasonable modifications.
Will findings appear in your public research corpus?
No, without explicit written consent. Engagement findings are confidential by default under the coordinated disclosure agreement.
What insurance do you carry?
Professional indemnity and public liability. Coverage details provided at scoping.
Do you test third-party-hosted SaaS or model APIs you do not operate?
Only with written authorisation from the relevant third party. This is addressed during scoping.

Always out of scope: denial-of-service attacks, social engineering of client staff, physical access testing, third-party infrastructure without written authorisation from that party.

Get Started

Discovery calls are free. We scope engagements based on your deployment timeline, risk profile, and regulatory obligations. Typical scoping takes 5 business days.

Alternative: Contact form