The Governance Lag Index: Measuring How Long It Takes Safety Regulation to Catch Up With AI Failure Modes | Blog

There is a consistent pattern in how AI governance responds to documented failure modes: it is slow, and the delay is not random — it follows predictable structural causes. Quantifying this delay is a precondition for taking it seriously as a risk management problem.

This brief proposes a Governance Lag Index (GLI) that measures the temporal gap between empirical documentation of a specific AI failure mode and the implementation of operative governance addressing that failure. A preliminary dataset of 10 events suggests the gap significantly exceeds historical analogues from other high-stakes industries.

Defining Operative Governance

For the GLI to be useful, “governance” requires a precise definition. We decompose it into four stages:

Stage A (Publication): A framework, guideline, or taxonomy is documented by a standards body or regulatory agency. This stage signifies awareness but lacks compulsion.

Stage B (Enactment): Legislation or binding regulation is passed into law, creating a statutory foundation for oversight.

Stage C (Enforcement): The enacted framework becomes active and the regulatory body has practical authority to levy penalties, mandate audits, or halt deployment.

Stage D (Efficacy): Empirical evidence demonstrates a statistically significant reduction in the incidence of the specific failure mode, directly attributable to the enforced framework.

Most AI governance in 2026 is at Stage A. Almost none has reached Stage D.

Historical Analogues

Historical precedents from other high-stakes industries provide a baseline.

The Boeing 737 MAX MCAS failure: the first fatal accident occurred October 2018; the FAA grounded the aircraft in March 2019, 4.5 months later. Recertification and systemic reform took 20 months. The governance lag from documented systemic failure to enforcement was under six months — driven by independent investigative bodies, mandatory incident reporting, and the regulator’s ability to halt physical operations globally.

The Three Mile Island partial meltdown occurred March 1979. The Kemeny Commission issued its report in October 1979. The nuclear industry established the Institute of Nuclear Power Operations for self-regulation within nine months. Governance lag to sweeping regulatory change: under 12 months — driven by the visible, catastrophic nature of the failure and intense public and congressional pressure.

Pharmaceutical adverse event reporting operates on 15-day mandatory notification timelines for serious adverse events. The lag between documented failure and regulatory enforcement is structurally constrained by mandatory reporting infrastructure.

What the Preliminary Data Shows

The GLI dataset v0.1 contains 10 events. Key observations from this small sample:

Adversarial examples (computer vision): First documented by Szegedy et al. in 2013. Formal governance — NIST AI 100-2e2023 — appeared 3,362 days later. This is the longest confirmed lag in the dataset.

Prompt injection: First empirically documented in September 2022 (arXiv:2209.02128). The NIST AI Risk Management Framework (January 2023) provides high-level guidance without binding enforcement. EchoLeak (CVE-2025-32711) — the first documented zero-click prompt injection with confirmed data exfiltration in a production system — occurred in January 2025. Approximate GLI to Stage A: 1,421 days. Stage C remains absent.

Instruction hierarchy subversion: First documented April 2024 (arXiv:2404.13208). No statutory-level governance exists as of this writing. Stage B and beyond: null.

Deceptive alignment (empirical): First documented December 2024 (arXiv:2412.14093). EU AI Act Article 14 human oversight provisions exist but cannot address a failure mode that specifically targets oversight mechanisms. Auditing methodology for inner misalignment is not codified. Stage C: null.

Negative GLI intervals: Two events in the dataset show negative GLI — generic regulatory coverage preceded the specific attack documentation. Instruction hierarchy has a −449 day figure, meaning existing guidelines covered the general case before the specific attack class was named. This does not indicate effective protection; it indicates generic frameworks that predate the specific threat characterisation.

VLA attacks and alignment faking: Null GLI. No governance framework anywhere addresses these failure modes as of March 2026.

The Australian Embodied AI Gap

Australia’s AI regulatory approach — confirmed by the National AI Plan (December 2025) — relies on existing laws, voluntary guidance, and the newly established AU AISI (announced November 2025, funded at AUD $29.9 million). The VAISS 10 guardrails remain the reference standard.

This approach creates a distinctive exposure. Australia has over 700 autonomous haulage trucks in mining operations as of 2022, with forecasts exceeding 1,800 units by 2025. These systems operate in high-consequence physical environments. The AU AISI’s initial scope is documented as focusing on large language models, not embodied systems. The WHS legislative framework (extended to digital work systems in NSW, February 2026) creates employer liability for AI-induced workplace harm — but without any specified adversarial testing methodology, employers cannot reliably demonstrate compliance.

The GLI for VLA-specific adversarial attacks in the Australian mining/logistics context is currently null: documented failure modes exist, no operative governance addresses them, and the institutional capacity to develop and enforce such governance is being built from scratch.

What This Framework Is and Isn’t

The GLI v0.1 dataset contains 10 events. This is insufficient for statistical conclusions about mean lags or trend analysis. The framework’s current value is conceptual: it provides a vocabulary for the gap between threat documentation and governance response, and a structure for accumulating the evidence base needed to make quantitative policy arguments.

The next substantive version of this analysis requires at minimum 30 events with fully compiled dates for T_discovery, T_framework, T_enact, and T_enforce across multiple jurisdictions. Issue #157 tracks this expansion.

This brief is PRELIMINARY. The GLI dataset v0.1 contains 10 events only. Quantitative claims about the AI governance lag require a substantially larger dataset before serving as the basis for policy advocacy.