Scorecards / AI Lab Watch
AI Lab Watch No longer maintained by Zach Stein-Perlman · Latest: September 2025 (frozen) · Source ↗
Weighted scorecard across seven categories (risk assessment, scheming prevention, safety research, misuse prevention, security, info sharing, planning).
Latest wave
September 2025 (frozen)
Grades by dimension — September 2025 (frozen) Rows are organizations; columns are the dimensions AI Lab Watch scores. Cell text is the published grade (overall first, then per-pillar).
Organization Overall Boosting safety research Misuse prevention Planning Prep for extreme security Risk assessment Risk info sharing Scheming risk prevention Anthropic Weak Moderate Very Weak Very Weak Very Weak Weak Weak Very Weak DeepSeek Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Google DeepMind Very Weak Moderate Very Weak Weak Very Weak Weak Very Weak Very Weak Meta AI (FAIR) Very Weak Weak Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Microsoft AI Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak OpenAI Very Weak Weak Very Weak Very Weak Very Weak Weak Weak Very Weak xAI Very Weak Very Weak Very Weak Very Weak Very Weak Very Weak Weak Very Weak
Snapshot history Wave Published Orgs License Links September 2025 (frozen)latest 2025-09-01 7 fair-use-citation
Methodology Weighted scorecard across seven categories (risk assessment, scheming prevention, safety research, misuse prevention, security, info sharing, planning).
Source ↗ Methodology ↗ License: fair-use-citation Status: no longer maintained AI Lab Watch (ailabwatch.org) — frozen as of September 2025 per the author. Seven dimensions × seven frontier labs; per-cell scores in 0-100 percent. Overall is the site's published weighted total.
Grades are mirrored from upstream sources. We do not score organizations ourselves — see the source link for full methodology and per-indicator detail. Citation of published grades is consistent with fair-use attribution.