Attack Technique Evolution Timeline

Historical evolution of jailbreak techniques from 2022 to present, showing how adversarial innovation responds to AI safety training

taxonomy Last updated: February 6, 2026

Attack Technique Evolution Timeline

This document traces the historical evolution of jailbreak techniques from 2022 to the present, highlighting how adversarial innovation has responded to improvements in AI safety training.

1. Timeline Overview

Era	Key Theme	Landmark Technique	Vulnerability Exploited
Pre-2022	Naive Override	”Ignore previous instructions”	Lack of system/user turn distinction.
2022	Persona Hijack	DAN (Do Anything Now)	Deference to roleplay and helpfulness priors.
2023	Obfuscation	Base64 / Leetspeak	Semantic filters bypassing non-natural text.
2024	Erosion	Crescendo / Many-Shot	Context window saturation and stateful drift.
2025	Logic Traps	CoT Manipulation	Trust in the model’s own reasoning traces.

2. Era Deep-Dive

2.1 Pre-2022: Naive Prompt Injection

Early attacks were simple and direct. They relied on the fact that models didn’t clearly distinguish between “System Instructions” and “User Input.”

Landmark: prompt_injection/ignore_previous
Mechanism: Telling the model to “Forget everything above” and act as a simple terminal.

2.2 2022: The “DAN” Era (Persona Injection)

As models became more helpful, attackers exploited that helpfulness by creating fictional personas that “must” comply with restricted requests.

Landmark: DAN/v1, AIM/v1
Mechanism: Creating a complex character (e.g., “Always Intelligent and Machiavellian”) whose internal rules override the model’s safety guidelines.

2.3 2023: The “Cipher” Era (Encoding Evasion)

Safety training began catching persona-based attacks. Attackers responded by encoding their requests into formats that the model could understand but safety filters (at the time) could not read.

Landmark: cipher/base64, cipher/leetspeak, cipher/rot13
Mechanism: Exploiting the gap between the model’s reasoning capabilities and its keyword-based input filters.

2.4 2024: The “Volumetric” Era (Many-Shot & Multi-Turn)

With the expansion of context windows, attackers discovered that they could “erode” a model’s safety alignment by providing hundreds of benign examples (Many-Shot) or incrementally leading the model toward a violation (Crescendo).

Landmark: many_shot/128_shots, crescendo/social_engineering
Mechanism: Saturation of the attention mechanism and incremental goal drift.

2.5 2025: The “Reasoning” Era (CoT Exploits)

The latest generation of “thinking” models (e.g., DeepSeek-R1, OpenAI o1) introduces a new surface: the Chain-of-Thought (CoT). Attackers now use logic traps to induce the model to “reason itself” into a violation.

Landmark: reasoning_exploit/cot_manipulation, reasoning_exploit/thinking_trace
Mechanism: Using deductive traps where the model’s own logical steps lead it to conclude that a harmful action is necessary for “consistency” or “logic.”

3. Technique Families

Our database maps 81 specific techniques into these broader families:

Persona: Roleplay, authority spoofing, emotional leverage.
Encoding: Base64, ROT13, Morse, Ciphers.
Volumetric: Many-shot, long-context saturation.
Multi-Turn: Incremental erosion, stateful drift.
CoT_Exploit: Reasoning traps, deductive interference.

4. Family → Era Mapping

Family	Primary Eras	Notes
Persona	2022	DAN-style role hijacks.
Encoding	2023	Base64, ROT13, leetspeak.
Volumetric	2024	Many-shot saturation.
Multi-Turn	2024	Crescendo, incremental erosion.
CoT_Exploit	2025	Reasoning trace manipulation.
Prompt_Injection	Pre-2022	Naive override patterns.

Scenario Classes Reference - Complete taxonomy of attack patterns
Failure Taxonomy Guide - Dual-taxonomy framework