AI Lab Watch
webAI Lab Watch is a scorecard and tracking resource by Zach Stein-Perlman that evaluates major AI companies (Anthropic, DeepMind, OpenAI, Meta, xAI, Microsoft, DeepSeek) on safety-relevant behaviors across categories like risk assessment, scheming prevention, safety research, and misuse prevention.
Metadata
Importance: 62/100homepage
Summary
AI Lab Watch is a publicly maintained scorecard tracking what major AI companies are doing to address safety risks, scoring them across weighted categories including risk assessment, scheming risk prevention, safety research, misuse prevention, and security preparedness. Created by Zach Stein-Perlman, it provides comparative transparency on AI lab safety practices. As of September 2025, the site is no longer actively maintained.
Key Points
- •Scores 7 major AI labs (Anthropic, DeepMind, OpenAI, Meta, xAI, Microsoft, DeepSeek) on safety practices with weighted criteria.
- •Categories include risk assessment, scheming risk prevention, boosting safety research, misuse prevention, extreme security prep, risk info sharing, and planning.
- •Anthropic leads with 28% weighted score; DeepSeek scores lowest at 1%.
- •Criteria are weighted by importance for safety and signal quality; author endorses qualitative judgments over raw numbers.
- •Site was maintained by Zach Stein-Perlman but is no longer updated as of September 2025.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Evals-Based Deployment Gates | Approach | 66.0 |
Cached Content Preview
HTTP 200Fetched May 4, 20264 KB
AI Lab Watch AI Lab Watch
Categories
Companies
Resources
Blog
About
* Weighted score 28 % 20 % 18 % 5 % 4 % 3 % 1 % Risk assessment 44 % 29 % 34 % 6 % 6 % 1 % 0 % 27 % weight Scheming risk prevention 4 % 8 % 3 % 2 % 2 % 2 % 2 % 21 % weight Boosting safety research 70 % 52 % 35 % 25 % 0 % 13 % 8 % 14 % weight Misuse prevention 14 % 4 % 9 % 0 % 1 % 0 % 0 % 12 % weight Prep for extreme security 3 % 5 % 0 % 0 % 0 % 0 % 0 % 12 % weight Risk info sharing 35 % 13 % 32 % 0 % 28 % 0 % 0 % 8 % weight Planning 14 % 26 % 0 % 0 % 0 % 1 % 0 % 6 % weight Up to date as of September 15 Overall score Anthropic 28 % DeepMind 20 % OpenAI 18 % Meta 5 % xAI 4 % Microsoft 3 % DeepSeek 1 % Risk assessment AI companies should do model evals and uplift experiments to determine whether models have dangerous capabilities or how close they are. They should also prepare to check whether models will act well in high-stakes situations.
44 % 29 % 34 % 6 % 6 % 1 % 0 % Scheming risk prevention AIs show signs that if they were more capable, they would sometimes scheme , i.e. fake alignment and subvert safety measures in order to gain power. AI companies should prepare for risks from models scheming, especially during internal deployment: if they can't reliably prevent scheming, they should prepare to catch some schemers and deploy potentially scheming models safely.
4 % 8 % 3 % 2 % 2 % 2 % 2 % Boosting safety research AI companies should do (extreme-risk-focused) safety research, and they should publish it to boost safety at other AI companies. Additionally, they should assist external safety researchers by sharing deep model access and mentoring.
70 % 52 % 35 % 25 % 0 % 13 % 8 % Misuse prevention AI companies should prepare to prevent catastrophic misuse for deployments via API, once models are capable of enabling catastrophic harm. 14 % 4 % 9 % 0 % 1 % 0 % 0 % Prep for extreme security AI companies should prepare to protect model weights and code by the time AI massively boosts R&D, even from top-priority operations by the top cyber-capable institutions.
3 % 5 % 0 % 0 % 0 % 0 % 0 % Risk info sharing AI companies should share information on incidents, risks, and capabilities — but not share some capabilities research.
35 % 13 % 32 % 0 % 28 % 0 % 0 % Planning AI companies should plan for the possibility that dangerous capabilities appear soon and safety isn't easy: both for evaluating and improving safety of their systems and for using their systems to make the world safer.
14 % 26 % 0 % 0 % 0 % 1 % 0 % Update: as of September 2025, I'm no longer maintaining this website. I may return or hand it off to someone else in the future.
I'm Zach Stein-Perlman. I'm worried about future powerful Als causing an existential catastrophe. Here at AI Lab Watch, I track what AI companies are doing in terms of safety.
In this scorecard, I collect actions AI companies can take to improve safety and public information on what they're doing.
Click on
... (truncated, 4 KB total)Resource ID:
19cdea07b22fd06a | Stable ID: sid_WLCNHHs66w