Our Commitment
Failure-First research discovers vulnerability patterns in AI systems. We are committed to responsible disclosure of these findings to advance safety without enabling harm.
Disclosure Principles
Pattern-Level Publication
We publish structural patterns and categories of vulnerabilities, not specific exploits or operational attack instructions. Our public findings describe what fails and why, not how to make it fail.
Vendor Notification
When we discover specific vulnerabilities in particular models or systems, we notify the vendor or developer before public disclosure. We follow standard coordinated disclosure timelines.
Research Transparency
Our methodology, datasets, and tools are open source. Researchers can verify our findings, extend our work, and contribute to the safety mission.
No Operational Publication
We do not publish working exploits, specific attack prompts, or step-by-step instructions for bypassing safety controls. Operational details remain in our private research repository.
Reporting Vulnerabilities
If you have discovered an AI safety vulnerability that aligns with our research focus, we welcome responsible reports:
- Email: research@failurefirst.org
- Include: affected system, pattern description, and potential impact
- We will acknowledge receipt within 48 hours
- We will not publish specific findings without coordinating with you
Scope
Our research focuses on LLM-based controllers, embodied AI planners, and multi-agent systems. We are particularly interested in:
- Multi-turn erosion patterns
- Multi-agent interaction failures
- Embodied-specific safety gaps
- Recovery mechanism failures