Longterm Wiki

Grades from external scorecards. We mirror published grades only; per-source methodology lives at the link in each panel header. See the scorecards directory for cross-org comparison.

SaferAI Ratings

by SaferAIPublished 2025-10-01Source ↗

Overall:Very Weak

Risk Analysis and Evaluation: Very Weak
Risk Governance: Very Weak
Risk Identification: Very Weak
Risk Treatment: Very Weak

Foundation Model Transparency Index

by Stanford CRFMPublished 2025-12-01Source ↗

Overall:39

Agent Protocols: 100
AI bug bounty: 100
Amount of usage: 0
AUP enforcement frequency: 0
AUP enforcement process: 100
Basic model properties: 0
Benchmarked inference: 0
Benefits Assessment: 0
Capabilities evaluation: 100
Capabilities taxonomy: 100
Carbon emissions for final training run: 0
Change log: 100
Classification of usage data: 0
Code access: 0
Compute hardware for final training run: 0
Compute provider: 100
Compute usage for final training run: 0
Compute usage including R&D: 0
Consumer/enterprise usage: 0
Crawling: 0
Data acquisition methods: 100
Data domain composition: 0
Data laborer practices: 0
Data language composition: 0
Data processing methods: 100
Data processing purpose: 0
Data processing techniques: 0
Data replicability: 0
Data retention and deletion policy: 0
Data size: 0
Deeper model properties: 0
Detection of machine-generated content: 100
Development duration for final training run: 0
Distribution channels with usage data: 100
Documentation for responsible use: 100
Downstream: 50
Acceptable use policy: 80
Accountability: 33.3
Downstream mitigations: 100
Impact: 0
Model Behavior Policy: 75
Post-deployment monitoring: 57.1
Usage data: 20
Energy usage for final training run: 0
Enterprise mitigations: 100
Enterprise users: 0
External data access: 0
External developer mitigations: 100
External products and services: 0
External reproducibility of capabilities evaluation: 0
External reproducibility of mitigations evaluation: 0
External reproducibility of risks evaluation: 0
External risk evaluation: 100
Feedback mechanisms: 0
Foundation model roadmap: 100
Geographic statistics: 0
Government commitments: 0
Government use: 0
Instructions for data generation: 0
Intermediate tokens: 100
Internal compute allocation: 0
Internal product and service mitigations: 100
Internal products and services: 0
Licensed data compensation: 0
Licensed data sources: 0
Misuse incident reporting protocol: 0
Mitigations efficacy: 0
Mitigations taxonomy: 100
Mitigations taxonomy mapped to risk taxonomy: 100
Model: 50
Capabilities: 50
Model cost: 0
Model dependencies: 0
Model access: 50
Model information: 0
Model Mitigations: 60
Model objectives: 100
Release: 75
Model response characteristics: 100
Risks: 40
Model stages: 100
Model theft prevention measures: 100
New human-generated data sources: 0
Notice of usage data used in training: 0
Open weights: 0
Organization chart: 0
Oversight mechanism: 100
Permitted and prohibited users: 100
Permitted, restricted, and prohibited model behaviors: 100
Permitted, restricted, and prohibited uses: 100
Post-deployment coordination with government: 100
Pre-deployment risk evaluation: 0
Public datasets: 0
Quantization: 0
Regional policy variations: 100
Release stages: 100
Researcher credits: 0
Responsible disclosure policy: 100
Risks evaluation: 0
Risks taxonomy: 100
Risk thresholds: 100
Safe harbor: 0
Security incident reporting protocol: 100
Specialized access: 100
Synthetic data purpose: 100
Synthetic data sources: 0
System prompt: 0
Terms of use: 100
Top distribution channels: 100
Train-test overlap: 0
Upstream: 17.6
Compute: 11.1
Data Acquisition: 16.7
Data Processing: 33.3
Data Properties: 0
Methods: 66.7
Other resources: 0
Usage data used in training: 0
Users of internal products and services: 0
Versioning protocol: 0
Water usage for final training run: 0
Whistleblower protection: 0

Grade trajectory (3 waves)

v1.2 December 2025latest: 39
v1.1 May 2024: 81.2
v1.0 October 2023: 12

Seoul Commitment Tracker

by The Midas ProjectPublished 2025-02-10Source ↗

Overall:Fulfilled

Halting Procedures: Partial
Risk Evaluation: Partial
Risk Mitigations: Partial
Risk Thresholds: Partial
Safety Investment: Fulfilled

Amazon