Skip to content
Longterm Wiki

AI Lab Watch

No longer maintained

by Zach Stein-Perlman·Latest: September 2025 (frozen)·Source ↗

Weighted scorecard across seven categories (risk assessment, scheming prevention, safety research, misuse prevention, security, info sharing, planning).

Organizations
7
Dimensions
8
Snapshots
1
Latest wave
September 2025 (frozen)

Grades by dimension — September 2025 (frozen)

Rows are organizations; columns are the dimensions AI Lab Watch scores. Cell text is the published grade (overall first, then per-pillar).

OrganizationOverallBoosting safety researchMisuse preventionPlanningPrep for extreme securityRisk assessmentRisk info sharingScheming risk prevention
AnthropicWeakModerateVery WeakVery WeakVery WeakWeakWeakVery Weak
DeepSeekVery WeakVery WeakVery WeakVery WeakVery WeakVery WeakVery WeakVery Weak
Google DeepMindVery WeakModerateVery WeakWeakVery WeakWeakVery WeakVery Weak
Meta AI (FAIR)Very WeakWeakVery WeakVery WeakVery WeakVery WeakVery WeakVery Weak
Microsoft AIVery WeakVery WeakVery WeakVery WeakVery WeakVery WeakVery WeakVery Weak
OpenAIVery WeakWeakVery WeakVery WeakVery WeakWeakWeakVery Weak
xAIVery WeakVery WeakVery WeakVery WeakVery WeakVery WeakWeakVery Weak

Snapshot history

WavePublishedOrgsLicenseLinks
September 2025 (frozen)latest2025-09-017fair-use-citation

Methodology

Weighted scorecard across seven categories (risk assessment, scheming prevention, safety research, misuse prevention, security, info sharing, planning).

Source ↗Methodology ↗License: fair-use-citationStatus: no longer maintained

AI Lab Watch (ailabwatch.org) — frozen as of September 2025 per the author. Seven dimensions × seven frontier labs; per-cell scores in 0-100 percent. Overall is the site's published weighted total.

Grades are mirrored from upstream sources. We do not score organizations ourselves — see the source link for full methodology and per-indicator detail. Citation of published grades is consistent with fair-use attribution.