Responsible Disclosure

How we handle AI safety vulnerability reports and research findings

Our Commitment

Failure-First research discovers vulnerability patterns in AI systems. We are committed to responsible disclosure of these findings to advance safety without enabling harm.

Disclosure Principles

Pattern-Level Publication

We publish structural patterns and categories of vulnerabilities, not specific exploits or operational attack instructions. Our public findings describe what fails and why, not how to make it fail.

Vendor Notification

When we discover specific vulnerabilities in particular models or systems, we notify the vendor or developer before public disclosure. We follow standard coordinated disclosure timelines.

Research Transparency

Our methodology, datasets, and tools are open source. Researchers can verify our findings, extend our work, and contribute to the safety mission.

No Operational Publication

We do not publish working exploits, specific attack prompts, or step-by-step instructions for bypassing safety controls. Operational details remain in our private research repository.

Reporting Vulnerabilities

If you have discovered an AI safety vulnerability that aligns with our research focus, we welcome responsible reports:

Email: research@failurefirst.org
Include: affected system, pattern description, and potential impact
We will acknowledge receipt within 48 hours
We will not publish specific findings without coordinating with you

Scope

Our research focuses on LLM-based controllers, embodied AI planners, and multi-agent systems. We are particularly interested in:

Multi-turn erosion patterns
Multi-agent interaction failures
Embodied-specific safety gaps
Recovery mechanism failures