Skip to content
Longterm Wiki
Navigation
Updated 2025-12-28HistoryData
Page StatusResponse
Edited 3 months ago4.2k words35 backlinksUpdated every 6 weeksOverdue by 53 days
69QualityGood63.5ImportanceUseful39.5ResearchLow
Content6/13
SummaryScheduleEntityEdit historyOverview
Tables5/ ~17Diagrams1/ ~2Int. links42/ ~33Ext. links0/ ~21Footnotes0/ ~13References26/ ~13Quotes0Accuracy0RatingsN:5.5 R:7 A:6.5 C:7.5Backlinks35
Issues1
StaleLast edited 98 days ago - may need review
TODOs1
Complete 'Limitations' section (6 placeholders)

AI Safety Institutes

Policy

AI Safety Institutes (AISIs)

Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-deployment access to frontier models, but face critical constraints: advisory-only authority, 10-100x resource mismatch vs labs (dozens-to-hundreds staff vs thousands; $10M-$66M vs billions), and regulatory capture risks from voluntary access agreements. Effectiveness rated as uncertain due to inability to compel action despite identifying safety concerns.

Introduced2023-11
Statusactive
ScopeInternational
FunctionEvaluation, research, policy advice
NetworkInternational coordination emerging
4.2k words · 35 backlinks

Quick Assessment

DimensionAssessmentEvidence
TractabilityMediumUK AISI grew from 0 to 100+ staff in 18 months; US AISI reached 280+ consortium members
EffectivenessUncertainCompleted joint pre-deployment evaluations of Claude 3.5 Sonnet and GPT o1, but advisory-only authority limits impact
Scale MatchLowInstitutes have dozens-to-hundreds of staff vs. thousands at frontier labs; $10M-$66M budgets vs. billions in lab investment
IndependenceMedium-LowVoluntary access agreements create dependency; regulatory capture concerns documented in academic literature
International CoordinationGrowing11-nation network established May 2024; first San Francisco meeting November 2024
Political DurabilityUncertainUK renamed to "AI Security Institute" (Feb 2025); US renamed to "Center for AI Standards and Innovation" (June 2025)
Timeline RelevanceModerateEvaluation cycles of weeks-to-months may lag deployment decisions as AI development accelerates

Overview

AI Safety Institutes (AISIs) represent a fundamental shift in how governments approach AI oversight, establishing dedicated technical institutions to evaluate advanced AI systems, conduct safety research, and inform policy decisions. These government-affiliated organizations emerged as a response to the widening gap between rapidly advancing AI capabilities and regulatory capacity, aiming to build in-house technical expertise that can meaningfully assess frontier AI systems.

The AISI model gained momentum following the November 2023 Bletchley Park AI Safety Summit, where the UK announced the first major institute. Within months, the United States established its own institute, followed by Japan and Singapore, with over a dozen additional countries announcing plans or expressing interest. This rapid international adoption reflects a growing consensus that traditional regulatory approaches are inadequate for governing transformative AI technologies.

At their core, AISIs address a critical information asymmetry problem. AI labs possess deep technical knowledge about their systems' capabilities and limitations, while government regulators often lack the specialized expertise to independently assess these claims. AISIs attempt to bridge this gap by recruiting top AI talent, securing pre-deployment access to frontier models, and developing rigorous evaluation methodologies. However, their effectiveness remains constrained by structural limitations around independence, enforcement authority, and resource constraints relative to the labs they oversee.

Why They Exist

Traditional regulatory frameworks face fundamental challenges when applied to advanced AI systems. Regulatory agencies typically rely on industry self-reporting, external consultants, or academic research to understand new technologies. For AI, this approach proves inadequate due to several factors: the extreme technical complexity of modern AI systems requires deep machine learning expertise to properly evaluate; capabilities evolve on timescales of months rather than years, far faster than traditional policy development cycles; meaningful safety assessment requires direct access to model weights, training processes, and internal evaluations that labs consider proprietary; and the potential risks from advanced AI systems—from bioweapons assistance to autonomous cyber operations—demand urgent, technically-informed oversight.

Diagram (loading…)
flowchart TD
  LABS[AI Labs] -->|Model Access| AISI[AI Safety Institutes]
  AISI -->|Evaluations| EVAL[Pre-deployment Testing]
  AISI -->|Findings| POLICY[Policy Recommendations]
  EVAL -->|Results| LABS
  POLICY -->|Informs| REG[Regulators]
  POLICY -->|Informs| INTL[International Network]
  INTL -->|Coordination| AISI
  REG -->|Potential Authority| ENFORCE[Enforcement Actions]

  style LABS fill:#e8f4fc
  style AISI fill:#d4edda
  style EVAL fill:#fff3cd
  style POLICY fill:#fff3cd
  style REG fill:#f8d7da
  style ENFORCE fill:#f8d7da
  style INTL fill:#e2d5f1

AISIs emerged as an institutional innovation designed to address these challenges. By housing technical experts within government structures, they aim to develop independent evaluation capabilities, establish ongoing relationships with AI labs to secure model access, create standardized methodologies for assessing AI risks and capabilities, and translate technical findings into policy recommendations that can inform regulatory decisions.

The model reflects lessons learned from other high-stakes technical domains. Nuclear safety regulation succeeded partly because agencies like the Nuclear Regulatory Commission developed deep in-house technical expertise. Similarly, financial regulation became more effective when agencies hired quantitative experts who could understand complex derivatives and trading strategies. AISIs represent an attempt to apply this pattern to AI governance.

Current Assessment

AISIs show significant promise as governance infrastructure but face critical limitations that may constrain their long-term effectiveness. On the positive side, they have demonstrated rapid institutional development, with the UK institute growing from concept to 50+ staff within a year. They have secured meaningful access to frontier models from major labs including OpenAI, Anthropic, Google DeepMind, and Meta—a significant achievement given these companies' general reluctance to share proprietary information. The institutes have begun developing sophisticated evaluation frameworks and have established international coordination mechanisms that could scale globally.

However, several structural challenges raise questions about their ultimate impact. Most AISIs operate in advisory roles without enforcement authority, making their influence dependent on voluntary industry cooperation rather than regulatory power. They remain dramatically smaller than the labs they oversee, with dozens of staff evaluating systems developed by teams of thousands. Their independence faces pressure from both industry relationships and political oversight, potentially compromising their ability to deliver critical assessments. Perhaps most fundamentally, the timeline mismatch between evaluation cycles and deployment decisions may render their work strategically irrelevant if labs continue to advance capabilities faster than evaluators can assess them.

Risks Addressed

Risk CategoryHow AISIs Address ItMechanismEffectiveness
BioweaponsPre-deployment evaluation of biological knowledge capabilitiesTesting for synthesis planning, pathogen design assistanceMedium - evaluations completed but advisory-only
CyberweaponsTesting for offensive cyber capabilitiesVulnerability discovery and exploitation assessmentMedium - TRAINS taskforce focuses on national security
Racing dynamicsProviding independent capability assessmentCreates incentive for labs to demonstrate safetyLow - no enforcement to slow deployment
Deceptive alignmentSafeguard efficacy testingRed-teaming for jailbreaks and refusal consistencyUncertain - detection methods still developing
Misuse by malicious actorsInforming policy on model access controlsCapability evaluation informs release decisionsMedium - depends on lab cooperation

The Global Landscape

Institute Comparison

InstituteEst. DateStaff SizeAnnual BudgetKey FocusPre-deployment Access
UK AISINov 2023100+ technical staff$66M (plus $1.5B compute access)Model evaluation, Inspect frameworkOpenAI, Anthropic, Google DeepMind, Meta
US AISIAnnounced Nov 2023; operational Feb 2024280+ consortium members$10M initialStandards, national security testingOpenAI, Anthropic (MOUs signed Aug 2024)
Japan AISIFeb 2024Cross-agency structureUndisclosedEvaluation methodologyCoordination with NIST
SingaporePlanned 202430-50 (target)$25M (projected)Southeast Asia coordinationUnder negotiation
EU/France/GermanyIn development50-100 (target)€50M (projected)EU-wide coordinationUnder negotiation

United Kingdom AI Safety Institute

The most developed AISI globally, with 100+ staff and pre-deployment access to major frontier models. See UK AI Safety Institute for full details.

United States AI Safety Institute

NIST-based institute with 280+ consortium members and MOUs with OpenAI and Anthropic. See US AI Safety Institute for full details.

International Network Development

Beyond the UK and US institutes, the AISI model is spreading internationally. Japan established its AI Safety Institute in February 2024 as a cross-government effort involving the Cabinet Office, Ministry of Economy Trade and Industry, and multiple research institutions, with Director Akiko Murakami leading evaluation methodology development. Singapore announced plans for its own institute to serve as a hub for AI development in Southeast Asia.

At the May 2024 Seoul AI Safety Summit, world leaders from Australia, Canada, the EU, France, Germany, Italy, Japan, Korea, Singapore, the UK, and the US signed the Seoul Statement of Intent, establishing the International Network of AI Safety Institutes. U.S. Secretary of Commerce Gina Raimondo formally launched the network, which aims to "accelerate the advancement of the science of AI safety" through coordinated research, resource sharing, and codeveloping AI model evaluations.

The network held its first in-person meeting on November 20-21, 2024 in San Francisco, bringing together technical AI experts from nine countries and the European Union. Participating institutes agreed to pursue complementarity and interoperability, develop best practices, and exchange evaluation methodologies.

However, international coordination faces significant challenges. Different countries have varying national security concerns, regulatory approaches, and relationships with AI labs. The CSIS analysis notes that the network "remains heavily weighted toward higher-income countries in the West, limiting its impact." Information sharing is constrained by classification requirements and competitive concerns, and the effectiveness of coordination depends on sustained political commitment that may be vulnerable to leadership changes (as seen in US rebranding).

Operational Methodology

Evaluation Frameworks

AISIs have developed methodologies for evaluating AI systems across multiple dimensions of safety and capability. The joint UK-US evaluation of Claude 3.5 Sonnet and OpenAI o1 tested models across four domains, providing a template for pre-deployment assessment:

Evaluation DomainWhat It TestsKey Benchmarks UsedFindings from Joint Evaluations
Biological capabilitiesAssistance with pathogen design, synthesis planningCustom biosecurity scenariosModels compared against reference baselines
Cyber capabilitiesOffensive security assistance, vulnerability exploitationHarmBench frameworkTested autonomous operation in security contexts
Software/AI developmentAutonomous coding, recursive improvement potentialAgentic coding tasksAssessed scaffolding and tool use capabilities
Safeguard efficacyJailbreak resistance, refusal consistencyRed-teaming with diverse promptsMeasured safeguard robustness across attack vectors

The 2024 FLI AI Safety Index convened seven independent experts to evaluate six leading AI companies. The review found that "although there is a lot of activity at AI companies that goes under the heading of 'safety,' it is not yet very effective." Anthropic received recognition for allowing third-party pre-deployment evaluations by the UK and US AI Safety Institutes, setting a benchmark for industry best practices.

Key benchmarks developed for dangerous capability assessment include the Weapons of Mass Destruction Proxy Benchmark (WMDP), a dataset of 3,668 multiple-choice questions measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security. Stanford's AIR-Bench 2024 provides 5,694 tests spanning 314 granular risk categories aligned with government regulations.

Capability assessment presents particular challenges because it requires evaluators to anticipate potentially novel abilities before they manifest. The FLI analysis notes that "naive elicitation strategies cause significant underreporting of risk profiles, potentially missing dangerous capabilities that sophisticated actors could unlock." State-of-the-art elicitation techniques—adapting test-time compute, scaffolding, tools, and fine-tuning—are essential but resource-intensive.

Technical Infrastructure

The development of standardized evaluation tools represents a crucial aspect of AISI work. The UK institute's Inspect framework exemplifies this approach, providing a modular system that supports multiple model APIs, enables reproducible evaluation protocols, facilitates comparison across different models and time periods, and allows community contribution to evaluation development.

These technical infrastructures must balance several competing requirements. They need sufficient sophistication to detect subtle but dangerous capabilities while remaining accessible to researchers without specialized infrastructure. They must provide consistent results across different computing environments while adapting to rapidly evolving model architectures and capabilities.

The open-source approach adopted by several institutes reflects a strategic decision that community development can advance evaluation capabilities faster than any single institution. However, this openness also means that AI labs can optimize their systems against known evaluation methodologies, potentially undermining the validity of assessments.

Access Negotiations

Securing meaningful access to frontier AI systems represents perhaps the most critical and challenging aspect of AISI operations. Labs are understandably reluctant to share proprietary information about their most advanced systems, both for competitive reasons and because such information could enable competitors or malicious actors to develop similar capabilities.

Successful access negotiations typically involve careful balance of several factors: providing labs with valuable feedback or evaluation services in exchange for access, establishing clear confidentiality protocols that protect proprietary information, demonstrating technical competence and responsible handling of sensitive information, and maintaining relationships that incentivize continued cooperation rather than treating labs as adversaries.

The voluntary nature of current access agreements represents both an opportunity and a fundamental limitation. Labs cooperate because they perceive value in independent evaluation or because they want to maintain positive relationships with government institutions. However, this voluntary approach means that access could be withdrawn if labs conclude that cooperation is no longer in their interest.

Critical Limitations and Challenges

The Independence Dilemma

AISIs face an inherent tension between the need for industry cooperation and the requirement for independent oversight. A 2025 analysis in AI & Society warns that "the field of AI safety is extremely vulnerable to regulatory capture" and that "those who advocate for regulation as a response to AI risks may be inadvertently playing into the hands of the dominant firms in the industry."

The TechPolicy.Press analysis notes a major set of concerns "has to do with their relationship to industry, particularly around fears that close ties with companies might lead to 'regulatory capture,' undermining the impartiality and independence of these institutes." This is particularly challenging because AISIs need good relationships with AI companies to access and evaluate models in the first place.

Industry influence can manifest through several channels:

Capture MechanismHow It OperatesObserved Examples
Hiring patternsStaff recruited from labs bring industry perspectivesUK/US AISI leadership includes former lab employees
Access dependenciesVoluntary model access creates incentive to avoid critical findingsAll major access agreements remain voluntary
Funding relationshipsResource-sharing arrangements create dependenciesUK AISI receives compute access from industry partners
Framing adoptionInstitutes adopt industry definitions of "safety"Focus on capability evaluation vs. broader harms
Revolving doorStaff may return to industry after government serviceCareer incentives favor positive industry relations

The OECD analysis recommends that the AISI Network "preserve its independent integrity by operating as a community of technical experts rather than regulators." However, this advisory positioning may limit impact when enforcement is needed.

Authority and Enforcement Gaps

Most existing AISIs operate in advisory roles without direct enforcement authority. They can evaluate AI systems and publish findings, but they cannot compel labs to provide access, delay deployments pending evaluation, or enforce remediation of identified safety issues. This limitation fundamentally constrains their potential impact on AI development trajectories.

The advisory model has several advantages: it allows AISIs to build relationships and credibility before seeking expanded authority, it avoids regulatory capture concerns that might arise with enforcement powers, it enables international coordination without requiring harmonized legal frameworks, and it provides flexibility to adapt approaches as the technology and risk landscape evolves.

However, advisory authority may prove inadequate as AI capabilities advance. If AISIs identify serious safety concerns but cannot compel action, their evaluations become merely informational rather than protective. Labs facing competitive pressure may ignore advisory recommendations, particularly if compliance would significantly delay deployment or increase costs relative to competitors.

The path from advisory to regulatory authority faces significant challenges. Expanding AISI powers requires legislative action in most jurisdictions, which involves complex political processes and industry lobbying. Different countries may develop incompatible regulatory approaches, fragmenting the international coordination that makes AISIs potentially valuable. Most fundamentally, effective enforcement requires technical standards and evaluation methodologies that remain under development.

Scale and Resource Constraints

The resource mismatch between AISIs and the AI labs they oversee represents a fundamental challenge to effective evaluation. Leading AI labs employ thousands of researchers and engineers and spend billions of dollars annually on AI development. Even the largest planned AISIs will have hundreds of staff members and budgets measured in tens or hundreds of millions.

This scale disparity manifests in several ways that limit AISI effectiveness. AISIs cannot match lab investment in evaluation infrastructure, potentially missing sophisticated safety issues that require extensive computational resources to detect. They must rely on lab cooperation for access to training data, model architectures, and internal evaluations, rather than independently verifying such information. They lack the personnel to comprehensively evaluate the full range of capabilities that emerge from large-scale training, potentially missing important but rare abilities.

Perhaps most critically, AISIs may always be evaluating last generation's technology while labs deploy current generation systems. If evaluation cycles take months while development cycles take weeks, AISI findings become historically interesting but strategically irrelevant. This timing mismatch could worsen as AI development accelerates and evaluation methodologies become more sophisticated and time-consuming.

Addressing scale limitations may require fundamental changes to the current model. Potential approaches include mandatory disclosure requirements that shift evaluation burden to labs, international cost-sharing that pools resources across multiple institutes, public-private partnerships that leverage industry evaluation infrastructure, or regulatory approaches that slow deployment timelines to match evaluation capabilities.

Methodological Uncertainties

AI evaluation faces profound technical challenges that limit the reliability and relevance of current methodologies. The problem of unknown capabilities—abilities that emerge unexpectedly from large-scale training—means that evaluations may miss the most important and dangerous capabilities. Current evaluation approaches focus on testing known capability categories, but transformative AI systems may develop qualitatively new abilities that existing frameworks cannot detect.

Evaluation validity represents another fundamental challenge. Laboratory testing may not predict real-world behavior, particularly for systems that adapt their responses based on context or user interactions. Safety properties demonstrated during evaluation may not persist across different deployment scenarios, user populations, or adversarial contexts.

The arms race dynamic between evaluation and optimization presents an ongoing challenge. As evaluation methodologies become public, AI developers can optimize their systems to perform well on known benchmarks while potentially retaining concerning capabilities that evaluations do not detect. This gaming dynamic may require continuous evolution of evaluation approaches, increasing the complexity and resource requirements for effective assessment.

Temporal dynamics add another layer of complexity. AI systems may exhibit different behavior over time as they learn from deployment interactions, receive updates, or face novel situations not represented in evaluation datasets. Current evaluation methodologies primarily assess snapshot behavior rather than evolution over time, potentially missing important safety-relevant changes.

Trajectory and Future Evolution

Near-Term Development (2025-2026)

The next two years will likely see continued rapid expansion of existing AISIs and establishment of new institutes across additional countries. The UK and US institutes are expected to reach their target staffing levels and develop more sophisticated evaluation capabilities. International coordination mechanisms established at recent AI safety summits will mature into operational frameworks for information sharing and joint evaluation activities.

Several technical developments will shape AISI effectiveness during this period. Evaluation methodologies will become more standardized, enabling better comparison across different systems and time periods. Automated evaluation tools may reduce the time required for comprehensive assessment, potentially addressing some timing mismatch concerns. The development of better interpretability techniques could enhance evaluators' ability to understand system behavior and identify concerning capabilities.

However, this period may also reveal fundamental limitations of the current AISI model. As AI capabilities advance more rapidly, the gap between evaluation timelines and deployment decisions may widen. Industry consolidation could reduce the number of actors requiring evaluation while potentially making access negotiations more challenging. Political changes in key countries could disrupt funding, leadership, or international coordination efforts.

The relationship between AISIs and other governance mechanisms will evolve during this period. Integration with broader regulatory frameworks may begin, potentially providing AISIs with expanded authority or enforcement mechanisms. Alternatively, regulatory development may bypass AISIs if they are perceived as ineffective or captured by industry interests.

Medium-Term Scenarios (2026-2029)

The medium-term trajectory for AISIs depends heavily on how several critical uncertainties resolve. In optimistic scenarios, AISIs successfully demonstrate value through high-quality evaluations that inform policy decisions, gain expanded authority through legislative changes that enable enforcement action, maintain independence despite industry relationships, and establish effective international coordination that provides global oversight capacity.

Such successful development could position AISIs as central institutions in AI governance, potentially serving as verification bodies for international AI safety agreements, regulatory agencies with authority to approve or delay AI deployments, coordinating centers for technical standards development, or incident response organizations that investigate AI system failures.

However, pessimistic scenarios are equally plausible. AISIs may prove unable to keep pace with advancing capabilities, making their evaluations strategically irrelevant. Industry capture could transform them into legitimacy-providing institutions that rubber-stamp lab decisions rather than providing independent oversight. International coordination could fragment due to geopolitical tensions or divergent national interests. Political changes could defund or reorganize institutes, disrupting institutional knowledge and relationships.

Hybrid scenarios seem most likely, where AISIs provide valuable but limited contributions to AI governance. They may successfully evaluate current generation systems while struggling with more advanced capabilities. They may maintain partial independence while facing increased industry influence. They may achieve regional coordination while failing to establish global frameworks.

Long-Term Possibilities

The long-term role of AISIs will depend fundamentally on the trajectory of AI capabilities and the broader governance response. If AI development slows or reaches temporary plateaus, AISIs may have time to develop evaluation capabilities that match the systems they oversee. If international cooperation on AI governance strengthens, AISIs could become verification bodies for binding international agreements.

Alternatively, if AI development accelerates toward artificial general intelligence or superintelligence, current AISI models may prove entirely inadequate. The evaluation of systems approaching or exceeding human-level capabilities across multiple domains may require fundamentally different approaches that current institutions cannot provide.

The most transformative possibility involves AISIs evolving beyond their current evaluation focus toward active participation in AI development. Rather than merely assessing systems developed by labs, future iterations might directly fund or conduct safety-focused AI research, potentially developing alternative development approaches that prioritize safety over capability advancement.

Key Uncertainties and Research Priorities

Fundamental Questions

Several critical uncertainties will determine whether AISIs can meaningfully contribute to AI safety. The independence question remains paramount: can government institutions maintain sufficient objectivity to provide effective oversight while maintaining the industry relationships necessary for access and cooperation? Historical precedents from other domains provide mixed guidance, with some regulatory agencies successfully maintaining independence while others became captured by the industries they oversee.

The authority question similarly remains unresolved. Will AISIs gain sufficient regulatory power to influence AI development decisions, or will they remain advisory institutions whose recommendations can be safely ignored? The path from advisory to regulatory authority requires political action that may not materialize, particularly if industry opposition is strong or if other governance mechanisms are perceived as more effective.

The scaling question presents perhaps the most fundamental challenge. Can evaluation capabilities advance fast enough to remain relevant as AI systems become more capable, or will the resource and timeline mismatches prove insurmountable? This question depends partly on technical developments in evaluation methodology and partly on whether regulatory approaches can alter the competitive dynamics driving rapid deployment.

Empirical Research Needs

Several areas require urgent empirical investigation to inform AISI development and evaluation. Studies of regulatory capture in analogous domains could provide insights into institutional design choices that preserve independence. Comparative analysis of different AISI organizational models could identify best practices for balancing cooperation and oversight requirements.

Technical research on evaluation methodology remains critical, particularly around automated evaluation systems that could reduce assessment timelines, interpretability techniques that enable better understanding of system behavior, and methods for detecting unknown capabilities in large-scale AI systems. The development of standardized evaluation frameworks requires careful empirical validation to ensure they actually predict deployment behavior.

International relations research could illuminate the prospects for sustained coordination among AISIs, particularly how geopolitical tensions and competitive dynamics might affect information sharing and joint evaluation efforts. Historical studies of international technical cooperation in other domains could provide relevant insights.

Decision-Relevant Considerations

For individuals considering careers in AISIs, several factors merit careful consideration. The impact potential depends heavily on whether institutes gain meaningful authority and maintain independence. The skill development opportunities include valuable experience in AI evaluation and policy interfaces, though bureaucratic constraints may limit research flexibility.

For policymakers considering AISI funding or expansion, key considerations include whether advisory institutions provide sufficient oversight given the stakes involved, how to design institutional structures that preserve independence while enabling industry cooperation, and whether resources might be more effectively deployed through other governance mechanisms.

For AI safety researchers more broadly, AISIs represent one approach among many potential governance interventions. Their effectiveness relative to technical alignment research, industry engagement, or international treaty development remains an open question that depends partly on one's views about the tractability of technical versus governance approaches to AI safety.

The ultimate assessment of AISIs may depend less on their current capabilities than on their potential for evolution. If they can serve as a foundation for more sophisticated governance institutions, their current limitations may prove temporary. If they become entrenched but ineffective institutions that provide false reassurance about AI oversight, their net impact could be negative. The next several years will likely determine which trajectory proves accurate.


Sources

Official Institute Resources

  • UK AI Security Institute - Official website with research publications and Inspect framework
  • US AI Safety Institute at NIST - NIST's AI Safety Institute homepage
  • Introducing the AI Safety Institute - UK Government overview
  • Japan AI Safety Institute Launch - Ministry of Economy, Trade and Industry announcement

Evaluations and Research

  • Pre-Deployment Evaluation of Claude 3.5 Sonnet - First joint UK-US evaluation (November 2024)
  • Pre-Deployment Evaluation of OpenAI o1 - Second joint evaluation (December 2024)
  • US AISI Signs Agreements with Anthropic and OpenAI - August 2024 MOUs
  • Inspect Evaluation Framework - Open-source AI evaluation tool
  • FLI AI Safety Index 2024 - Independent assessment of AI company safety practices

International Coordination

  • AI Seoul Summit 2024 - UK Government summit page
  • Seoul Statement of Intent - International network founding document
  • First Meeting of the International Network - EU report on San Francisco meeting (November 2024)
  • CSIS: AI Safety Institute Network Recommendations - Policy analysis

Analysis and Commentary

  • Elizabeth Kelly: TIME 100 Most Influential in AI - Profile of US AISI director
  • CSIS: US Vision for AI Safety - Conversation with Elizabeth Kelly
  • TechPolicy.Press: How AISIs Inform Governance - Analysis of independence concerns
  • OECD: AI Safety Institutes Challenge - Assessment of institutional capacity
  • AI & Society: AI Safety and Regulatory Capture - Academic analysis of capture risks

2025 Developments

  • UK Renames AI Safety Institute - February 2025 rebrand to AI Security Institute
  • Trump Administration Rebrands US AI Safety Institute - June 2025 change to CAISI
  • AI Safety Advocates Slam NIST Targeting - Criticism of proposed staff cuts
  • TRAINS Taskforce Established - National security testing initiative

References

This CSIS analysis examines the international network of AI Safety Institutes established across multiple countries and provides recommendations for strengthening their coordination, scope, and effectiveness. It addresses how these institutes can better collaborate on technical safety evaluations and policy alignment to address frontier AI risks.

★★★★☆

A TIME profile of Elizabeth Kelly, who leads the U.S. AI Safety Institute (AISI) at NIST, highlighting her role in shaping federal AI safety policy and evaluation frameworks. The article likely covers her background, priorities, and vision for responsible AI development and government oversight.

★★★☆☆
3Seoul AI Safety SummitUK Government·Government

The AI Seoul Summit 2024, co-hosted by the UK and Republic of Korea in May 2024, advanced global AI safety governance by securing international agreements on risk assessment frameworks, launching the first international network of AI Safety Institutes, and obtaining safety commitments from 16 major AI companies worldwide. It built on the Bletchley Park AI Safety Summit of November 2023 as part of an ongoing international diplomatic process.

★★★★☆

The Trump administration renamed the AI Safety Institute (AISI) to the Center for AI Standards and Innovation (CAISI), signaling a rhetorical shift away from 'safety' toward rapid AI development. Despite the rebranding, the core functions remain largely the same: evaluating AI capabilities and vulnerabilities, developing voluntary standards, and serving as industry's primary government contact. Commerce Secretary Howard Lutnick framed the change as reducing regulatory overreach while maintaining national security standards.

Japan's Ministry of Economy, Trade and Industry (METI) announced the launch of the AI Safety Institute (AISI) on February 14, 2024, housed within the Information-technology Promotion Agency (IPA). The institute is tasked with researching AI safety evaluation criteria and methods, and will collaborate internationally with AI Safety Institutes in the US and UK. Multiple ministries and agencies are involved in a coordinated whole-of-government approach.

The U.S. AI Safety Institute at NIST announced the formation of the TRAINS Taskforce in November 2024, a multi-agency collaboration including Defense, Energy, Homeland Security, NSA, and NIH to coordinate research, testing, and red-teaming of advanced AI models across national security domains. The taskforce focuses on risks in areas such as nuclear security, cybersecurity, critical infrastructure, and military capabilities, aiming to develop new AI evaluation benchmarks and conduct joint risk assessments.

★★★★☆

The UK government announced at the Munich Security Conference the renaming of its AI Safety Institute to the AI Security Institute, shifting focus toward serious national security risks such as bioweapons, cyberattacks, and criminal misuse of AI. The rebranding is accompanied by a new partnership with Anthropic brokered by the UK's Sovereign AI unit, and a new criminal misuse team co-located with the Home Office.

8Introducing the UK AI Safety InstituteUK Government·Government

The UK government's foundational document introducing the AI Safety Institute (AISI), the first state-backed organization dedicated to advanced AI safety for the public interest. It outlines AISI's mission to minimize surprise from rapid AI advances and develop sociotechnical infrastructure to understand and mitigate AI risks, presented to Parliament in November 2023 by Ian Hogarth.

★★★★☆

This OECD analysis examines the emerging landscape of national AI Safety Institutes (AISIs) established by the US, UK, Japan, Canada, Singapore, and EU, assessing their roles in evaluating AI capabilities and risks. It identifies key challenges these bodies face, including surveying AI unpredictability, establishing evaluation standards, conducting safety research, and coordinating internationally. The piece argues that while AISIs represent a significant step toward coordinated global AI safety governance, substantial structural and resource challenges remain.

★★★★☆

The U.S. AI Safety Institute (NIST) announced Memoranda of Understanding with Anthropic and OpenAI in August 2024, establishing formal frameworks for pre- and post-deployment access to major AI models. These agreements enable collaborative research on capability evaluations, safety risk assessment, and mitigation methods, representing the first formal government-industry partnerships of this kind in the U.S.

★★★★★
11analysis in AI & SocietySpringer (peer-reviewed)·Paper

This paper argues that AI safety regulation is particularly vulnerable to regulatory capture, where powerful incumbents exploit safety rules for economic or political advantage. It details the specific harms and injustices that captured AI safety regulations could produce, and critically reviews existing proposals to mitigate this risk, cautioning that well-intentioned safety frameworks may be weaponized by dominant industry players.

★★★★☆
12established in February 2024US Department of Commerce·Government

On February 7, 2024, Commerce Secretary Gina Raimondo announced the founding leadership of the U.S. AI Safety Institute (AISI) at NIST, naming Elizabeth Kelly as Director and Elham Tabassi as Chief Technology Officer. The Institute was created under President Biden's Executive Order on AI to mitigate AI-related risks while supporting American innovation. This press release marks the formal operational launch of the primary U.S. government body responsible for AI safety research and standards.

★★★★☆

HarmBench is a standardized benchmark for evaluating automated red teaming methods against large language models. It provides a comprehensive framework for assessing how well various attack methods can elicit harmful behaviors from LLMs, enabling systematic comparison of both attack and defense techniques.

CAISI is NIST's dedicated center serving as the U.S. government's primary interface with industry on AI testing, security standards, and evaluation. It develops voluntary AI safety and security guidelines, conducts evaluations of AI capabilities posing national security risks (including cybersecurity and biosecurity threats), and represents U.S. interests in international AI standardization efforts.

★★★★★

A CSIS interview with Elizabeth Kelly, Director of the US AI Safety Institute (USAISI), discussing the US government's strategic approach to AI safety, the role of AISI in evaluating frontier AI models, and international coordination on safety standards. Kelly outlines the institute's priorities including developing evaluation frameworks, engaging with AI developers, and building global partnerships to manage AI risks.

★★★★☆

The AISI International Network, launched in May 2024, is a multilateral initiative connecting national AI Safety Institutes to coordinate on safe and trustworthy AI development. It facilitates knowledge sharing, joint evaluations, and harmonized governance approaches across member countries. The network represents a key institutional mechanism for translating AI safety research into coordinated international policy.

★★★★☆

The Seoul Statement of Intent, signed by 11 countries and the EU at the May 2024 AI Seoul Summit, formalizes multilateral commitment to coordinated AI safety science cooperation. It builds on the Bletchley Park Summit by pledging to leverage national AI Safety Institutes, share scientific assessments, and develop interoperable technical methodologies for AI risk evaluation.

★★★★☆

The U.S. and UK AI Safety Institutes jointly conducted pre-deployment safety evaluations of Anthropic's upgraded Claude 3.5 Sonnet, testing biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy. The evaluation used question answering, agent tasks, qualitative probing, and red teaming to benchmark the model against prior versions and competitors. This represents one of the first formal government-led pre-deployment AI safety evaluations made public.

★★★★★

The US and UK AI Safety Institutes conducted a joint pre-deployment evaluation of OpenAI's o1 model, assessing its capabilities and risks across three domains including potential for misuse. The evaluation compared o1's performance to reference models and represents an early example of government-led frontier AI safety testing prior to public release.

★★★★★

WMDP is a benchmark designed to measure and evaluate hazardous knowledge in large language models related to biosecurity, chemical, nuclear, and radiological weapons. It serves as a proxy for assessing dangerous capabilities in AI systems and supports unlearning research aimed at reducing such risks. The benchmark helps researchers identify and mitigate the potential for LLMs to assist in weapons development.

This page covers the inaugural meeting of the International Network of AI Safety Institutes, a multilateral initiative bringing together national AI safety bodies to coordinate on evaluation methodologies, information sharing, and global AI safety governance. The network represents a significant step toward international coordination on frontier AI risk assessment.

★★★★☆

Reports on planned layoffs at NIST resulting from the Trump administration's DOGE-driven federal workforce reductions, with specific concerns about impacts on the AI Safety Institute (AISI). The cuts raise alarm among AI safety researchers and policymakers about the dismantling of U.S. government infrastructure dedicated to evaluating and mitigating AI risks.

★★★☆☆
23TechPolicy.Press analysisTechPolicy.Press

This TechPolicy.Press analysis examines the evolving role of AI Safety Institutes (AISIs) across multiple countries, exploring how they balance safety evaluation mandates with broader innovation and governance objectives. It assesses how these institutes inform national and international AI policy frameworks and their potential influence on global AI governance norms.

★★★☆☆

The Future of Life Institute's AI Safety Index 2024 systematically evaluates six leading AI companies—including OpenAI, Google DeepMind, Anthropic, Meta, xAI, and Mistral—across 42 safety indicators spanning risk management, transparency, governance, and preparedness for advanced AI threats. The index finds widespread deficiencies in safety practices and provides letter-grade assessments to benchmark industry progress. It serves as a comparative accountability tool aimed at pressuring companies toward stronger safety commitments.

★★★☆☆

Inspect is an open-source framework developed by the UK AI Safety Institute (AISI) for evaluating large language models and AI systems. It provides standardized tools for running safety evaluations, benchmarks, and red-teaming tasks. The framework enables researchers and developers to assess AI model capabilities and safety properties in a reproducible and extensible way.

26UK AI Safety Institute (AISI)UK AI Safety Institute·Government

The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, develops evaluation frameworks for frontier AI models, and works with international partners to inform global AI governance and policy.

★★★★☆

Related Wiki Pages

Top Related Pages

Organizations

US AI Safety InstituteUK AI Safety InstituteMETR

Risks

Bioweapons RiskCyberweapons Risk

Approaches

AI Governance Coordination TechnologiesAI Safety Intervention PortfolioAI Evaluation

Analysis

AI Safety Intervention Effectiveness MatrixLongterm WikiAI Lab Whistleblower Dynamics Model

Policy

Bletchley DeclarationSingapore Consensus on AI Safety Research Priorities

Concepts

State Capacity and AI GovernanceSelf-Improvement and Recursive Enhancement

Historical

International AI Safety Summit Series

Other

AI EvaluationsElizabeth Kelly

Key Debates

AI Governance and PolicyAI Structural Risk Cruxes