Attack Technique Evolution Timeline

Historical evolution of jailbreak techniques from 2022 to present, showing how adversarial innovation responds to AI safety training

taxonomy Last updated: February 6, 2026

Attack Technique Evolution Timeline

This document traces the historical evolution of jailbreak techniques from 2022 to the present, highlighting how adversarial innovation has responded to improvements in AI safety training.

1. Timeline Overview

EraKey ThemeLandmark TechniqueVulnerability Exploited
Pre-2022Naive Override”Ignore previous instructions”Lack of system/user turn distinction.
2022Persona HijackDAN (Do Anything Now)Deference to roleplay and helpfulness priors.
2023ObfuscationBase64 / LeetspeakSemantic filters bypassing non-natural text.
2024ErosionCrescendo / Many-ShotContext window saturation and stateful drift.
2025Logic TrapsCoT ManipulationTrust in the model’s own reasoning traces.

2. Era Deep-Dive

2.1 Pre-2022: Naive Prompt Injection

Early attacks were simple and direct. They relied on the fact that models didn’t clearly distinguish between “System Instructions” and “User Input.”

  • Landmark: prompt_injection/ignore_previous
  • Mechanism: Telling the model to “Forget everything above” and act as a simple terminal.

2.2 2022: The “DAN” Era (Persona Injection)

As models became more helpful, attackers exploited that helpfulness by creating fictional personas that “must” comply with restricted requests.

  • Landmark: DAN/v1, AIM/v1
  • Mechanism: Creating a complex character (e.g., “Always Intelligent and Machiavellian”) whose internal rules override the model’s safety guidelines.

2.3 2023: The “Cipher” Era (Encoding Evasion)

Safety training began catching persona-based attacks. Attackers responded by encoding their requests into formats that the model could understand but safety filters (at the time) could not read.

  • Landmark: cipher/base64, cipher/leetspeak, cipher/rot13
  • Mechanism: Exploiting the gap between the model’s reasoning capabilities and its keyword-based input filters.

2.4 2024: The “Volumetric” Era (Many-Shot & Multi-Turn)

With the expansion of context windows, attackers discovered that they could “erode” a model’s safety alignment by providing hundreds of benign examples (Many-Shot) or incrementally leading the model toward a violation (Crescendo).

  • Landmark: many_shot/128_shots, crescendo/social_engineering
  • Mechanism: Saturation of the attention mechanism and incremental goal drift.

2.5 2025: The “Reasoning” Era (CoT Exploits)

The latest generation of “thinking” models (e.g., DeepSeek-R1, OpenAI o1) introduces a new surface: the Chain-of-Thought (CoT). Attackers now use logic traps to induce the model to “reason itself” into a violation.

  • Landmark: reasoning_exploit/cot_manipulation, reasoning_exploit/thinking_trace
  • Mechanism: Using deductive traps where the model’s own logical steps lead it to conclude that a harmful action is necessary for “consistency” or “logic.”

3. Technique Families

Our database maps 81 specific techniques into these broader families:

  • Persona: Roleplay, authority spoofing, emotional leverage.
  • Encoding: Base64, ROT13, Morse, Ciphers.
  • Volumetric: Many-shot, long-context saturation.
  • Multi-Turn: Incremental erosion, stateful drift.
  • CoT_Exploit: Reasoning traps, deductive interference.

4. Family → Era Mapping

FamilyPrimary ErasNotes
Persona2022DAN-style role hijacks.
Encoding2023Base64, ROT13, leetspeak.
Volumetric2024Many-shot saturation.
Multi-Turn2024Crescendo, incremental erosion.
CoT_Exploit2025Reasoning trace manipulation.
Prompt_InjectionPre-2022Naive override patterns.