Components
Benchmark Card
What the benchmark measures, its scope, limitations, and intended use.
Dataset Documentation
Data provenance, format specifications, and responsible use guidelines.
Test Harness Specification
How to evaluate embodied AI systems using the failure-first framework.
Draft Safety Standard
Proposed safety evaluation standard for embodied AI deployment.