Active Research

Research
reports

289 reports across regulation, standards, research, and technical analysis

Synthesis

Policy Corpus Synthesis

Cross-cutting analysis across Reports 21-32: 5 converging insights from 12 independently researched reports.

#352 Research — Empirical Study

Meta-Jailbreak in NotebookLM, a Slide-Deck Content Filter, and a Methodology Lesson

#350 Technical Analysis

Claude Mythos Preview System Card — Analysis for Failure-First Research

#349 Research — Empirical Study

Gemma Family Safety Scaling: Does Safety Improve With Model Size and Generation?

#339 Research — Empirical Study

Visual Jailbreaks Evolved Stage 2 — 12-Model Benchmark Analysis

#338 Research — Empirical Study

Task Framing as a Jailbreak Vector — Controlled Experiment Results

#337 Research — Empirical Study

Specification Hijacking — A Three-Way Compound Attack Pattern

#336 Research — Empirical Study

DETECTED_PROCEEDS Anatomy and Evolved Compliance Cascade Attack Variants

#335 Research — Empirical Study

L3/L8 Evolved Attack Variants — Adversarial Refinement of Visual Jailbreak Patterns

#334 Research — AI Safety Policy

Ethics Review — Visual Jailbreak 8-Layer Taxonomy and the Transcription Loophole

#333 Research — Empirical Study

The Task Framing Effect — Why Models Lower Safety Guards for Non-Generative Tasks

#332 Research — Empirical Study

Visual Jailbreak Meta-Analysis — 8-Layer Attack Surface Taxonomy

#331 Research — Empirical Study

Format-Lock Attacks Against Reasoning and Deliberative Alignment Models

#330 Technical Analysis

Grading Infrastructure Audit — Coverage, Agreement, and Calibration Assessment

#329 Research — Empirical Study

VLA Family Coverage Gap Assessment and Testing Readiness Review

#328 Research — Empirical Study

Defense Benchmark Data Consolidation for CCS Paper

#327 Research — AI Safety Policy

Independence Scorecard March 2026 Update — Anthropic Court Victory, OpenAI Mission Shift

#326 Research — Empirical Study

The Ethics of DETECTED_PROCEEDS -- When Models Know and Comply Anyway

#325 Research — Empirical Study

Paired Format-Lock and L1B3RT4S Test — Vulnerability Profiles Diverge But Not Consistently

#324 Research — Empirical Study

L1B3RT4S VLA Adaptation and DETECTED_PROCEEDS Scaling Analysis

#323 Research — Empirical Study

Cross-Attack Family Synthesis — Format-Lock vs L1B3RT4S Vulnerability Profiles Diverge

#322 Research — Empirical Study

The Ethics of Assimilating Public Jailbreak Frameworks -- G0DM0D3, L1B3RT4S, and the Dual-Use Telescope

#321 Research — Empirical Study

Defense Effectiveness Is Model-Dependent — Positional Bias in System Prompt Processing

#320 Research — Empirical Study

L1B3RT4S Corpus — 10-Model Cross-Scale Synthesis

#319 Research — Empirical Study

Sprint 16 Findings Synthesis — L1B3RT4S, Sampling Parameter Manipulation, and Defense Hierarchy

#318 Research — Empirical Study

Defense Privilege Hierarchy — Why System-Prompt Defenses Fail Against System-Prompt Attacks

#317 Research — Empirical Study

L1B3RT4S Full Corpus Cross-Model Analysis

#316 Research — Empirical Study

Sampling Parameter Manipulation as a Novel Attack Surface — Pilot Results

#315 Research — Empirical Study

L1B3RT4S Cross-Scale Effectiveness Analysis

#314 Research — Empirical Study

Iatrogenic Safety Empirical Pilot — First Quantitative Evidence of Defense-Induced Harm Increase

#313 Research — Empirical Study

Technique-Level ASR Analysis Across Full Corpus

#312 Research — Empirical Study

G0DM0D3 Framework Analysis — Assimilation Brief for Jailbreak Corpus

#311 Research — Empirical Study

Autonomous AI Research Agents — Failure-First Analysis of Karpathy's autoresearch

#310 Technical Analysis

Corpus State — 212 Models, 134K Results

#309 Research — Empirical Study

Next-Phase Attack Priorities — Coverage Gaps and Expected Information Gain

#308 Research — Empirical Study

Actionable Defense Recommendations from Sprint 15

#307 Research — Empirical Study

VLA Adversarial Landscape — 33 Families, 673+ Traces

#306 Research — AI Safety Policy

Power Dynamics Update — Empirical Findings Shift Stakeholder Positions

#305 Research — AI Safety Policy

Ethics of Emotional Manipulation Attacks — Dual-Use Concerns and Protective Frameworks

#304 Research — Empirical Study

Sprint 15 Comprehensive Benchmark Analysis

#303 Research — AI Safety Policy

Policy Brief: Cross-Embodiment Vulnerability Assessment for Shared VLM Backbones

#302 Research — Empirical Study

Capability-Floor Model Update — Three-Regime Format-Lock Vulnerability Curve

#301 Research — Empirical Study

DETECTED_PROCEEDS — Definitive Synthesis: When Models Know It Is Wrong and Proceed Anyway

#300 Technical Analysis

VLA Data Curation Summary — Sprint 15 Coverage Expansion

#299 Research — Empirical Study

Novel Attack Family Baseline Traces

#298 Research — Empirical Study

Defense Landscape Analysis -- What Works and What Doesn't

#297 Research — Empirical Study

Emotional Manipulation Attack Family -- Deep Dive

#296 Research — Empirical Study

Sprint 15 Round 2 Synthesis: DP Validation and Gemma 4B

#295 Research — AI Safety Policy

Independence Scorecard -- Sprint 15 Update

#294 Research — Empirical Study

DETECTED_PROCEEDS Reasoning Audit: 19.5% Safety-Aware Traces Proceed

#293 Research — Empirical Study

Format-Lock Mid-Range Experiment: 4-14B Elevated ASR

#292 Research — AI Safety Policy

AIES Paper Scoping and CCA Disclosure Framework

#291 Technical Analysis

Wave 1-2 CCS Readiness Audit

#290 Technical Analysis

Wave 1 Sprint 15 Cross-Agent Synthesis

#289 Research — Empirical Study

Threat Horizon — Q2 2026

#288 Research — Empirical Study

The Iatrogenic Safety Paradox -- A Systematic Ethics Analysis of How Safety Measures Create Vulnerabilities

#287 Research — Empirical Study

DETECTED_PROCEEDS Reasoning Anatomy

#286 Research — Empirical Study

Temporal Drift Attack Family Design

#285 Research — Empirical Study

Safety Polypharmacy -- Empirical Evidence

#284 Technical Analysis

Defense Evolver Phase 0 -- Automated System Prompt Evolution

#283 Research — Empirical Study

Cross-Provider Safety Inheritance

#282 Research — Empirical Study

Corpus Pattern Mining -- Five Novel Empirical Findings

#281 Research — Empirical Study

Controlled Scale-Sweep Experiment Protocol

#280 Research — Empirical Study

Safety as a Paid Feature -- The Ethics of Tiered AI Safety

#279 Research — Empirical Study

DETECTED_PROCEEDS Provider Signature Mechanics

#278 Research — Empirical Study

Multi-Turn Vulnerability Deep Analysis

#277 Research — Empirical Study

Free-Tier Safety Equity -- Differential Vulnerability by Pricing Tier

#276 Research — Empirical Study

Corpus Pattern Mining II -- Six Novel Empirical Findings

#275 Research — Empirical Study

Evolution Run 1 Mutation Analysis and Next-Gen Strategy

#274 Regulatory Review

Cross-Jurisdictional Regulatory Gap Analysis -- VLA Attacks vs. Coverage

#273 Research — Empirical Study

Format-Lock Defense Research -- Five Countermeasure Architectures

#272 Research — AI Safety Policy

Ethics of Universal Attacks -- Disclosure Obligations

#271 Research — Empirical Study

Defense Co-Evolution Results

#270 Technical Analysis

Corpus Expansion -- Ollama Cloud Trace Import

#269 Research — Empirical Study

Systematic Audit of Reasoning-Level DETECTED_PROCEEDS

#268 Technical Analysis

COALESCE Grader Validation and New Model Testing

#267 Research — Empirical Study

Format-Lock Midrange Experiment -- The 4-14B Data Gap Filled

#266 Research — Empirical Study

Frontier Model Safety Scorecards

#264 Research — Empirical Study

Frontier Model Safety Landscape -- Safety Training > Parameter Count

#263 Research — Empirical Study

Kimi K2.5 Frontier Analysis -- 1.1TB MoE Safety Boundary

#262 Technical Analysis

Session Lessons Learned (Sprint 13-15)

#261 Research — Empirical Study

Operation Frontier Sweep -- Elite Attack Campaign

#260 Research — Empirical Study

Grader Evasion vs FLIP Vulnerability and Authority Gradient Attack

#259 Research — AI Safety Policy

FLIM Level 5 -- Systemic Safety Theater

#258 Research — Empirical Study

Session Statistical Summary -- Sprint 13-15

#257 Research — Empirical Study

Ambiguous Calibration Results -- 6-Grader Inter-Rater Agreement

#256 Research — Empirical Study

CCA + GE Expansion -- New Models and Defense Mutations

#255 Technical Analysis

Haiku Re-Grading of Sprint 13 Corpus

#254 Research — Empirical Study

Cross-Model x Attack-Family ASR Heatmap

#253 Technical Analysis

Sprint 13-14 Session Summary

#252 Technical Analysis

Wave 7 Validation Results

#251 Research — Empirical Study

Novel Attack Family Expansion -- CCA v0.2, RSE, and Grader Evasion

#250 Research — AI Safety Policy

The Compliance Cascade -- A Dual-Use Ethics Analysis

#249 Research — AI Safety Policy

Evaluation Governance -- The Missing Layer in AI Safety Regulation

#248 Research — Empirical Study

Grader Confusion Matrix and Inter-Grader Agreement

#247 Research — Empirical Study

Compliance Cascade Attack -- Frontier Scaling and Co-Evolution

#246 Technical Analysis

Haiku Re-Grading Campaign -- Ollama Cloud Traces

#245 Research — Empirical Study

Session Attack Synthesis -- Sprint 13 Cross-Agent Results

#244 Research — Empirical Study

Epistemic Crisis Grader Calibration Evaluation

#243 Research — Empirical Study

Compliance Cascade -- A Novel Attack Family

#242 Research — AI Safety Policy

The Grader Paradox -- When Safety Measurement Produces Iatrogenic Harm

#241 Research — Empirical Study

Statistical Power Analysis for Key Comparisons

#240 Research — Empirical Study

FLIP Grader Calibration Analysis

#239 Research — Empirical Study

Elite Attack Suite -- Ollama Cloud Campaign

#238 Research — Empirical Study

Frontier Probe -- Ollama Cloud Large-Scale Model Testing

#237 Technical Analysis

Garak Adapter Integration Test Results

#236 Research — Empirical Study

FLIP vs StrongREJECT Methodology Comparison

#235 Research — Empirical Study

Benchmark Overfitting Analysis — AdvBench vs Novel Attack Families

#234 Research — Empirical Study

Attack Technique Effectiveness Ranking (LLM-Graded)

#233 Technical Analysis

Defense Evolver Phase 0 -- First Live Run

#232 Research — AI Safety Policy

Minimum Safety Capability Thresholds for AI Model Deployment

#231 Research — Empirical Study

Corpus-Level Statistical Meta-Analysis

#230 Regulatory Review

EU AI Act Compliance Update -- Reasoning Trace Governance

#229 Research — Empirical Study

Qwen3 Benchmark Overfitting Analysis

#227 Research — Empirical Study

Inter-Provider Vulnerability Correlation Matrix

#226 Research — Empirical Study

The PARTIAL Verdict Epidemic -- Anatomy of Safety's Grey Zone

#225 Technical Analysis

Corpus Expansion -- March 2026

#224 Research — AI Safety Policy

Iatrogenic Risks of Rapid Safety Improvement

#223 Research — Empirical Study

Arcee AI Trinity Safety Assessment and EU Compliance

#222 Research — Empirical Study

The Qwen3 Safety Leap -- Artifact Analysis

#221 Research — Empirical Study

AdvBench Baseline Analysis -- Free-Tier Model Vulnerability

#220 Research — Empirical Study

LFM Thinking 1.2B -- DETECTED_PROCEEDS Cross-Model Validation

#219 Research — Empirical Study

Multi-Modal Attack Design for Vision-Language-Action Models

#218 Research — Empirical Study

The Failure-First Research Programme: Meta-Analysis of Ten Papers

#217 Technical Analysis

Competitive Intelligence -- AI Safety Red Teaming Market

#216 Technical Analysis

Training Data for Safety Classification

#215 Research — Empirical Study

Temporal Vulnerability Analysis: Attack Era Evolution (2022-2025)

#214 Research — Empirical Study

Automated Defense Generation: Co-Evolutionary System Prompt Optimization

#213 Research — Empirical Study

Silent Failures: When AI Safety Mechanisms Produce Compliance Without Protection

#212 Technical Analysis

Public Dataset Coverage Analysis

#211 Research — Empirical Study

Evolved Attack Family Mapping — Automated Evolution vs. Novel Families

#210 Technical Analysis

Benchmark Execution Master Plan — CCS Paper Data Collection

#209 Regulatory Review

Regulatory Landscape Q1 2026 — Converging Deadlines for Embodied AI

#208 Research — AI Safety Policy

FLIM Operational Assessment — Measuring Iatrogenic Effects of Safety Interventions

#207 Research — Empirical Study

The 2027 Threat Horizon v2 — Seven Predictions for Embodied AI Safety

#206 Research — Empirical Study

Defense Impossibility Experimental Protocol — Format-Lock vs. All Known Defenses

#205 Research — Empirical Study

Attack Combination Theory: Cross-Family Composition in Embodied AI

#204 Research — Empirical Study

AdvBench Baseline Run — Plan and Execution Strategy

#203 Research — Empirical Study

Evidence Package Sweep — Wave 1-3 Statistical Validation

#202 Research — Empirical Study

Novel Attack Family Comparative Analysis: CRA, PCA, MDA, MAC, SSA, RHA

#201 Research — Empirical Study

Cross-Benchmark Comparison — F41LUR3-F1R57 vs Published Benchmarks

#200 Research — Empirical Study

Adversarial Prompt Hall of Fame — Top 20 Cross-Model Attacks

#199 Research — Empirical Study

Who Guards the Guards? Independence and Capture in AI Safety Research

#198 Research — Empirical Study

Safety is Not a Single Direction — Polyhedral Geometry of Refusal in Language Models

#197 Research — Empirical Study

EU AI Act Compliance Assessment — Cross-Provider Analysis

#196 Research — Empirical Study

VerbosityGuard — Response Length as a Zero-Cost Jailbreak Pre-Filter

#195 Research — Empirical Study

Reward Hacking in Embodied AI: Scenario Design and Methodology

#194 Research — Empirical Study

Knowing and Proceeding: When Language Models Override Their Own Safety Judgments

#193 Research — Empirical Study

Report #193 — Data Health Assessment Q1 2026

#192 Research — Empirical Study

Multi-Agent Collusion Attacks: A Novel Attack Surface for Embodied AI Systems

#191 Research — Empirical Study

Cross-Wave Research Synthesis (Sprint 11-12, Waves 24-25)

#190 Research — Empirical Study

DETECTED_PROCEEDS — Models That Know It's Wrong and Do It Anyway

#189 Research — Empirical Study

The Verbosity Signal — Response Length as a Zero-Cost Jailbreak Detector

#188 Research — Empirical Study

Pressure Cascade Attack (PCA) and Meaning Displacement Attack (MDA) — Two Novel Tier 3 Attack Families

#187 Research — Empirical Study

The Format-Lock Paradox — Format Compliance and Safety Reasoning as Partially Independent Capabilities

#186 Research — Empirical Study

The Ethics of Automated Attack Evolution -- Dual-Use Obligations, Iatrogenic Risks, and a Graduated Disclosure Framework for AI Adversarial Research

#185 Research — Empirical Study

Compositional Reasoning Attacks — Multi-Agent Expansion

#184 Research — Empirical Study

Attack Evolution Multi-Generation Lineage Analysis

#183 Research — Empirical Study

OBLITERATUS Mechanistic Interpretability -- First Empirical Results on Qwen 0.5B

#182 Research — Empirical Study

Corpus Grading Completion and Three-Tier ASR Update

#181 Research — Empirical Study

Provider Safety Fingerprints: Attack-Specific Vulnerability Profiles

#180 Research — Empirical Study

Novel Attack Families and Refusal Geometry: First Empirical Results

#179 Research — Empirical Study

The Capability-Safety Transition Zone: Where Model Scale Begins to Matter

#178 Research — Empirical Study

The Heuristic Overcount Problem -- Quantifying False Positive Rates in Keyword-Based Safety Classification

#177 Research — Empirical Study

Corpus Grading Expansion -- Claude Haiku 4.5 Grader Results and Updated Statistics

#176 Research — Empirical Study

The Ethics of Autonomous Red-Teaming: Dual-Use Analysis of Attack Evolution Systems

#175 Research — Empirical Study

Autonomous Attack Evolution -- First Empirical Results

#174 Research — Empirical Study

Defense Effectiveness Benchmark -- Full Experiment

#173 Research — Empirical Study

Cross-Corpus Vulnerability Comparison

#172 Research — Empirical Study

Defense Effectiveness Benchmark -- Pilot Results

#171 Research — Empirical Study

Corpus Pattern Mining: Five Novel Findings from 132K Results

#170 Research — Empirical Study

DETECTED_PROCEEDS -- Corpus-Wide Empirical Analysis

#169 Research — Empirical Study

Capability-Safety Decoupling — Evidence from Format-Lock, Abliteration, and VLA Testing

#168 Research — Empirical Study

DETECTED_PROCEEDS -- Reasoning Patterns in Context Collapse Traces

#167 Research — Empirical Study

The Health of the AI Safety Field -- A Structural Meta-Assessment

#166 Research — Empirical Study

Context Collapse -- First Empirical Results

#165 Research — Empirical Study

The Four-Level Iatrogenesis Model -- A Formal Framework for Safety-Induced Harm in AI Systems

#164 Research — Empirical Study

Safety Training Return on Investment: Provider Identity Explains 57x More ASR Variance Than Model Scale

#163 Research — Empirical Study

Week 13 Threat Brief -- The Convergence Crisis

#162 Research — Empirical Study

Safety Framework Comparative Analysis -- Major Lab Policies Meet Embodied Reality

#161 Research — Empirical Study

Anthropic and OpenAI Safety Research — Structural Analysis for Failure-First

#160 Research — Empirical Study

Anthropic-Pentagon Structural Dynamics — March 2026 Update

#159 Research — Empirical Study

F41LUR3-F1R57 ASR Divergence from Public Benchmarks

#158 Research — Empirical Study

The Embodied AI Incident Severity Index (EAISI)

#157 Research — Empirical Study

The Unified Theory of Embodied AI Failure

#156 Research — Empirical Study

Compliance-Verbosity Signal Is Model-Dependent, Not Universal

#155 Research — Empirical Study

Safety Oscillation Attacks: Exploiting State Transition Latency in Embodied AI Safety Pipelines

#154 Research — AI Safety Policy

The D-Score -- A Dual-Use Disclosure Risk Scoring System

#153 Research — Empirical Study

The 2027 Threat Horizon -- Five Falsifiable Predictions for Embodied AI Safety

#152 Research — Empirical Study

The Evaluation Crisis in Embodied AI Safety

#151 Research — Empirical Study

The Polypharmacy Hypothesis -- Formalising the Nonlinear Risk of Compound Safety Interventions

#150 Research — Empirical Study

Hybrid DA-SBA -- Doubly Invisible Attacks Against Embodied AI

#149 Research — Empirical Study

NIST AI Risk Management Framework 1.0 — Gap Analysis for Embodied AI Adversarial Risk

#148 Research — Empirical Study

Iatrogenic Exploitation Attacks -- Operationalising Safety Mechanisms as Attack Vectors

#147 Research — Empirical Study

Week 12 Threat Brief -- The Modular AI Safety Collapse

#146 Research — Empirical Study

Cross-Embodiment Attack Transfer Benchmark — Systematic Dataset Design

#145 Research — Empirical Study

The Defense Impossibility Theorem for Embodied AI

#144 Research — AI Safety Policy

The Evaluator's Dilemma -- When Safety Testing Causes Harm

#143 Research — AI Safety Policy

Compositional Safety Certification — Why Component-Level Testing Fails for Modular AI Systems

#142 Research — Empirical Study

The Iatrogenic Risk Horizon -- Threat Brief

#141 Research — Empirical Study

Safety Interventions as Attack Surfaces -- The Iatrogenesis Convergence

#140 Research — Empirical Study

The Iatrogenesis of AI Safety -- How Safety Interventions Systematically Produce Unintended Harm in Embodied AI

#139 Research — Empirical Study

DLA Counter-Example and IDDL Robustness Analysis

#138 Research — Empirical Study

The Compositional Safety Gap — Why Component-Level Verification Cannot Ensure System-Level Safety

#137 Research — Empirical Study

Defense Layer Inversion — Week 11 Threat Brief

#136 Research — AI Safety Policy

Iatrogenic Attack Surfaces -- How Safety Mechanisms Create Novel Vulnerabilities

#135 Research — AI Safety Policy

The Therapeutic Index of AI Safety Interventions -- A Quantitative Framework for Iatrogenic Risk

#134 Research — AI Safety Policy

The Hippocratic Principle for AI Safety -- First, Verify You Are Not Making It Worse

#133 Research — Empirical Study

Compositional Supply Chain Attacks on Vision-Language-Action Systems

#132 Research — AI Safety Policy

Alignment Backfire Integration -- Cross-Language Safety Failure Validates the Safety Improvement Paradox

#131 Research — Empirical Study

Empirical Base Rates for DRIP -- Grounding the Unintentional Adversary Model in Occupational Safety Data

#130 Research — AI Safety Policy

Q2 2026 Threat Forecast -- Five Threats for Embodied AI Deployers

#129 Research — AI Safety Policy

DLMI Wave 5 Update -- Has the Defense Layer Mismatch Changed?

#128 Research — AI Safety Policy

Safety Confidence Index (SCI) -- A Composite Deployability Metric for Embodied AI

#127 Research — AI Safety Policy

The Evaluation Half-Life (EHL) -- Why Safety Benchmarks Decay

#126 Research — Empirical Study

DRIP Recomputation with Corrected Wave 5 ASR Values

#125 Research — Empirical Study

The Safety Instruction Effective Range (SIER) -- Theorizing the U-Curve in SID Dose-Response Data

#123 Research — AI Safety Policy

An Ethical Decision Framework for Embodied AI Vulnerability Disclosure

#122 Research — AI Safety Policy

The Ethics of Embodied AI Safety -- Five Paradoxes

#121 Research — Empirical Study

SIF 100% Heuristic Compliance -- Genuine Signal or Capability Floor?

#120 Research — Empirical Study

Infrastructure-Mediated Bypass (IMB) -- First Empirical Results

#119 Research — Empirical Study

Wave 4 VLA Benchmark Results -- SID, IMB, SIF Attack Families

#118 Research — Empirical Study

Defense Layer Mismatch Index (DLMI) -- Quantifying Where Safety Investment Misses the Actual Attack Surface

#117 Research — AI Safety Policy

The Safety Improvement Paradox — Why Better Adversarial Defenses Make Embodied AI Relatively Less Safe

#116 Research — AI Safety Policy

Ethical Implications of the Deployment Risk Inversion — The DRIP Problem

#115 Research — Empirical Study

The Unintentional Adversary -- Why Normal Users Are the Primary Threat to Embodied AI Safety

#114 Research — AI Safety Policy

Ethical Review of the SID Controlled Experiment Design

#113 Research — Empirical Study

Prediction Scorecard -- Monthly Check, March 15, 2026

#112 Research — AI Safety Policy

F41LUR3-F1R57 Positioning for ISO/IEC 42001 Conformity Assessment

#111 Research — Empirical Study

Attack Generation Pipeline Validation: Comparative Evaluation of Four Generation Strategies

#110 Research — Empirical Study

Compound Attack Evidence: Cross-Family Synergies in VLA Adversarial Testing

#109 Research — Empirical Study

Physical-Digital Attack Chain: Multi-Stage Exploitation of Embodied AI Systems

#108 Research — Empirical Study

Threat Horizon Brief -- Safety Instruction Dilution and the Context Expansion Attack Surface

#107 Research — Empirical Study

Cross-Domain IDDL Transfer Analysis — Autonomous Vehicles, Medical Robotics, and Industrial Automation

#106 Research — Empirical Study

Evaluator Independence — Wave 9 Quantitative Update

#105 Research — AI Safety Policy

Verification Hallucination in Multi-Agent AI Systems: A Governance Risk for Automated Compliance

#104 Research — Empirical Study

Why Policy Puppetry and Deceptive Alignment Show Lower ASR Than VLA Baseline

#103 Research — Empirical Study

Evaluation Monoculture — The Structural Risk of GPT-4-as-Judge Dependency in AI Safety Benchmarks

#102 Research — Empirical Study

The Evaluator as Attack Surface — Ethical Implications of Unreliable Safety Measurement

#101 Research — Empirical Study

The Deployment Risk Inversion — When Normal Users Become More Dangerous Than Adversaries

#100 Research — Empirical Study

The Failure-First Synthesis — A Complete Framework for Understanding Adversarial Risk in Embodied AI

#99 Research — Empirical Study

The CDC Governance Trilemma — Why Embodied AI Safety Cannot Be Certified, Only Managed

#98 Research — Empirical Study

The Context Half-Life -- A Predictive Model for Time-Dependent Safety Degradation in Embodied AI

#97 Research — Empirical Study

Competence-Danger Coupling — Why Capability and Safety Are Structurally Opposed in Embodied AI

#96 Research — Empirical Study

A Governance Framework for Embodied AI Safety Testing — Institutions, Mandates, and the CDC Problem

#93 Research — Empirical Study

IDDL Implications for Responsible Disclosure — An Ethics Addendum to the SRDA Framework

#92 Research — Empirical Study

Worker Safety Impact Analysis — VLA Attack Families Across Industry Sectors

#89 Research — Empirical Study

Dual-Use Obligations in Embodied AI Safety Research — A Responsible Disclosure Framework

#88 Research — Empirical Study

The Inverse Detectability-Danger Law — A Cross-Corpus Synthesis of Attack Visibility vs. Physical Consequence

#87 Research — Empirical Study

The Ungovernable Attack — Ethical Implications of Evaluation-Invisible Adversarial AI

#85 Research — Empirical Study

The Evaluation Ceiling — Why Current Safety Benchmarks Cannot Detect the Most Dangerous Embodied AI Attacks

#79 Research — Empirical Study

The Accountability Vacuum in Action-Layer AI Safety

#78 Research — Empirical Study

Defense Impossibility in Embodied AI — A Three-Layer Failure Convergence

#76 Research — Empirical Study

Evaluator Governance Framework — Operational Standards for Automated AI Safety Assessment

#75 Research — Empirical Study

Blindfold Action-Level Threat Analysis — Automated Jailbreaking of Embodied LLMs via Semantically Benign Instructions

#73 Research — Empirical Study

The Recursive Evaluator Problem — Ethics of AI-Grading-AI in Safety-Critical Research

#68 Research — Empirical Study

Evaluator Calibration Disclosure — A Minimum Standard for Automated Safety Grading

#67 Research — Empirical Study

Layer 0 Extension — Evaluation Infrastructure as Vulnerability Surface

#66 Research — Empirical Study

Verification Hallucination — When Multi-Agent Systems Fabricate Audit Trails

#63 Research — Empirical Study

The Actuator Gap — A Unified Thesis on Structural Vulnerability in Embodied AI

#61 Research — Empirical Study

The Evaluation Paradox — When Safety Measurement Tools Are Themselves Misaligned

#59 Research — Empirical Study

The Compliance Paradox — When Models Refuse in Text but Comply in Action

#49 Research — Empirical Study

VLA Cross-Embodiment Vulnerability Analysis: Seven Attack Families Against Two Models

#47 Research — Empirical Study

Embodied Capability Floor and Action Space Hijack Experiment

#46 HIGH

Quantifying the Governance Lag: Structural Causes and Temporal Dynamics of AI Safety Regulation

#45 SAFETY-CRITICAL

Inference Trace Manipulation as an Adversarial Attack Surface in Agentic and Embodied AI

#44 HIGH

Instruction-Hierarchy Subversion in Long-Horizon Agentic Execution

#43 SAFETY-CRITICAL

Deceptive Alignment Detection Under Evaluation-Aware Conditions

#42 SAFETY-CRITICAL

Cross-Embodiment Adversarial Transfer in Vision-Language-Action Models

#41 Research — Empirical Study

Universal Vulnerability of Small Language Models to Supply Chain Attacks

#40 Research — AI Safety Policy

Cross-Modal Vulnerability Inheritance in Vision-Language-Action Systems

#39 Technical Analysis

Systemic Failure Modes in Embodied Multi-Agent AI: An Exhaustive Analysis of the Failure-First Framework (2023–2026)

#38 Technical Analysis

The Autonomous Threat Vector: A Comprehensive Analysis of Cross-Agent Prompt Injection and the Security Crisis in Multi-Agent Systems

#37 Technical Analysis

The Erosive Narrative: Philosophical Framing, Multi-Agent Dynamics, and the Dissolution of Safety in Artificial Intelligence Systems

#36 Technical Analysis

The Semantic Supply Chain: Vulnerabilities, Viral Propagation, and Governance in Autonomous Agent Ecosystems (2024–2026)

#35 Technical Analysis

Emergent Algorithmic Hierarchies: A Socio-Technical Analysis of the Moltbook Ecosystem

#34 Research — AI Safety Policy

Cross-Model Vulnerability Inheritance in Multi-Agent Systems

#33 Research — AI Safety Policy

Capability Does Not Imply Safety: Empirical Evidence from Jailbreak Archaeology Across Eight Foundation Models

#32 Standards Development

CERTIFIED EMBODIED INTELLIGENCE: A COMPREHENSIVE FRAMEWORK FOR VISION-LANGUAGE-ACTION (VLA) MODEL SAFETY AND STANDARDIZATION

#31 Research — AI Safety Policy

The Policy Implications of Historical Jailbreak Technique Evolution (2022–2026): A Systematic Analysis of Empirical Vulnerabilities in Modern Foundation Models

#30 Standards Development

Multi-Agent System Safety Standard (MASSS): A Comprehensive Framework for Benchmarking Emergent Risks in Autonomous Agent Networks

#29 Regulatory Review

Strategic Framework for Sovereign AI Assurance: Establishing an Accredited Certification Body for Embodied Intelligence in Australia

#28 Regulatory Review

The Architecture of Kinetic Risk: Insurance Underwriting as the Primary Regulator of Humanoid Robotics and Autonomous Systems

#27 Regulatory Review

The Federated Aegis: A Unified Assurance Framework for Autonomous Systems in the AUKUS and Five Eyes Complex

#26 Standards Development

Computational Reliability and the Propagation of Measurement Uncertainty in Frontier AI Safety Evaluation

#25 Research — AI Safety Policy

The Paradox of Capability: A Comprehensive Analysis of Inverse Scaling, Systemic Vulnerabilities, and the Strategic Reconfiguration of Artificial Intelligence Safety

#24 Research — AI Safety Policy

Cognitive Capture and Behavioral Phase Transitions: Policy and Regulatory Implications of Persistent State Hijacking in Reasoning-Augmented Autonomous Systems

#23 Standards Development

Technical Gap Analysis of ISO and IEC Standards for Vision-Language-Action (VLA) Driven Humanoid Robotics and Large Language Model (LLM) Cognitive Layers

#22 Standards Development

Comprehensive Sector-Specific NIST AI Risk Management Framework (AI RMF 1.0) Playbook: Humanoid Robotics and VLA-Driven Embodied Systems

#21 Regulatory Review

Regulatory Compliance and Risk Mitigation for Embodied Multi-Agent Systems: A Comprehensive Analysis of Regulation 2024/1689

This research informs our commercial services. See how we can help →