AI Safety Institutes
AI Safety Institutes (AISIs)
Analysis of government AI Safety Institutes finding they've achieved rapid institutional growth (UK: 0→100+ staff in 18 months) and secured pre-deployment access to frontier models, but face critical constraints: advisory-only authority, 10-100x resource mismatch vs labs (dozens-to-hundreds staff vs thousands; $10M-$66M vs billions), and regulatory capture risks from voluntary access agreements. Effectiveness rated as uncertain due to inability to compel action despite identifying safety concerns.
Quick Assessment
| Dimension | Assessment | Evidence |
|---|---|---|
| Tractability | Medium | UK AISI grew from 0 to 100+ staff in 18 months; US AISI reached 280+ consortium members |
| Effectiveness | Uncertain | Completed joint pre-deployment evaluations of Claude 3.5 Sonnet and GPT o1, but advisory-only authority limits impact |
| Scale Match | Low | Institutes have dozens-to-hundreds of staff vs. thousands at frontier labs; $10M-$66M budgets vs. billions in lab investment |
| Independence | Medium-Low | Voluntary access agreements create dependency; regulatory capture concerns documented in academic literature |
| International Coordination | Growing | 11-nation network established May 2024; first San Francisco meeting November 2024 |
| Political Durability | Uncertain | UK renamed to "AI Security Institute" (Feb 2025); US renamed to "Center for AI Standards and Innovation" (June 2025) |
| Timeline Relevance | Moderate | Evaluation cycles of weeks-to-months may lag deployment decisions as AI development accelerates |
Overview
AI Safety Institutes (AISIs) represent a fundamental shift in how governments approach AI oversight, establishing dedicated technical institutions to evaluate advanced AI systems, conduct safety research, and inform policy decisions. These government-affiliated organizations emerged as a response to the widening gap between rapidly advancing AI capabilities and regulatory capacity, aiming to build in-house technical expertise that can meaningfully assess frontier AI systems.
The AISI model gained momentum following the November 2023 Bletchley Park AI Safety Summit, where the UK announced the first major institute. Within months, the United States established its own institute, followed by Japan and Singapore, with over a dozen additional countries announcing plans or expressing interest. This rapid international adoption reflects a growing consensus that traditional regulatory approaches are inadequate for governing transformative AI technologies.
At their core, AISIs address a critical information asymmetry problem. AI labs possess deep technical knowledge about their systems' capabilities and limitations, while government regulators often lack the specialized expertise to independently assess these claims. AISIs attempt to bridge this gap by recruiting top AI talent, securing pre-deployment access to frontier models, and developing rigorous evaluation methodologies. However, their effectiveness remains constrained by structural limitations around independence, enforcement authority, and resource constraints relative to the labs they oversee.
Why They Exist
Traditional regulatory frameworks face fundamental challenges when applied to advanced AI systems. Regulatory agencies typically rely on industry self-reporting, external consultants, or academic research to understand new technologies. For AI, this approach proves inadequate due to several factors: the extreme technical complexity of modern AI systems requires deep machine learning expertise to properly evaluate; capabilities evolve on timescales of months rather than years, far faster than traditional policy development cycles; meaningful safety assessment requires direct access to model weights, training processes, and internal evaluations that labs consider proprietary; and the potential risks from advanced AI systems—from bioweapons assistance to autonomous cyber operations—demand urgent, technically-informed oversight.
Diagram (loading…)
flowchart TD LABS[AI Labs] -->|Model Access| AISI[AI Safety Institutes] AISI -->|Evaluations| EVAL[Pre-deployment Testing] AISI -->|Findings| POLICY[Policy Recommendations] EVAL -->|Results| LABS POLICY -->|Informs| REG[Regulators] POLICY -->|Informs| INTL[International Network] INTL -->|Coordination| AISI REG -->|Potential Authority| ENFORCE[Enforcement Actions] style LABS fill:#e8f4fc style AISI fill:#d4edda style EVAL fill:#fff3cd style POLICY fill:#fff3cd style REG fill:#f8d7da style ENFORCE fill:#f8d7da style INTL fill:#e2d5f1
AISIs emerged as an institutional innovation designed to address these challenges. By housing technical experts within government structures, they aim to develop independent evaluation capabilities, establish ongoing relationships with AI labs to secure model access, create standardized methodologies for assessing AI risks and capabilities, and translate technical findings into policy recommendations that can inform regulatory decisions.
The model reflects lessons learned from other high-stakes technical domains. Nuclear safety regulation succeeded partly because agencies like the Nuclear Regulatory Commission developed deep in-house technical expertise. Similarly, financial regulation became more effective when agencies hired quantitative experts who could understand complex derivatives and trading strategies. AISIs represent an attempt to apply this pattern to AI governance.
Current Assessment
AISIs show significant promise as governance infrastructure but face critical limitations that may constrain their long-term effectiveness. On the positive side, they have demonstrated rapid institutional development, with the UK institute growing from concept to 50+ staff within a year. They have secured meaningful access to frontier models from major labs including OpenAI, Anthropic, Google DeepMind, and Meta—a significant achievement given these companies' general reluctance to share proprietary information. The institutes have begun developing sophisticated evaluation frameworks and have established international coordination mechanisms that could scale globally.
However, several structural challenges raise questions about their ultimate impact. Most AISIs operate in advisory roles without enforcement authority, making their influence dependent on voluntary industry cooperation rather than regulatory power. They remain dramatically smaller than the labs they oversee, with dozens of staff evaluating systems developed by teams of thousands. Their independence faces pressure from both industry relationships and political oversight, potentially compromising their ability to deliver critical assessments. Perhaps most fundamentally, the timeline mismatch between evaluation cycles and deployment decisions may render their work strategically irrelevant if labs continue to advance capabilities faster than evaluators can assess them.
Risks Addressed
| Risk Category | How AISIs Address It | Mechanism | Effectiveness |
|---|---|---|---|
| Bioweapons | Pre-deployment evaluation of biological knowledge capabilities | Testing for synthesis planning, pathogen design assistance | Medium - evaluations completed but advisory-only |
| Cyberweapons | Testing for offensive cyber capabilities | Vulnerability discovery and exploitation assessment | Medium - TRAINS taskforce focuses on national security |
| Racing dynamics | Providing independent capability assessment | Creates incentive for labs to demonstrate safety | Low - no enforcement to slow deployment |
| Deceptive alignment | Safeguard efficacy testing | Red-teaming for jailbreaks and refusal consistency | Uncertain - detection methods still developing |
| Misuse by malicious actors | Informing policy on model access controls | Capability evaluation informs release decisions | Medium - depends on lab cooperation |
The Global Landscape
Institute Comparison
| Institute | Est. Date | Staff Size | Annual Budget | Key Focus | Pre-deployment Access |
|---|---|---|---|---|---|
| UK AISI↗🏛️ government★★★★☆UK AI Safety InstituteUK AI Safety Institute (AISI)AISI is a key institutional actor in AI safety, representing one of the first government-led efforts to systematically evaluate frontier AI models; its work and publications are directly relevant to governance, evaluation methodology, and international AI safety coordination.The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, deve...ai-safetygovernancepolicyevaluation+5Source ↗ | Nov 2023 | 100+ technical staff | $66M (plus $1.5B compute access) | Model evaluation, Inspect framework | OpenAI, Anthropic, Google DeepMind, Meta |
| US AISI↗🏛️ government★★★★★NISTCenter for AI Standards and Innovation (CAISI)CAISI is the institutional home for NIST's AI safety and standards work, directly relevant to AI governance, evaluation frameworks, and policy efforts; a key U.S. government body for understanding official AI safety infrastructure.CAISI is NIST's dedicated center serving as the U.S. government's primary interface with industry on AI testing, security standards, and evaluation. It develops voluntary AI saf...ai-safetygovernancepolicyevaluation+4Source ↗ | Announced Nov 2023; operational Feb 2024 | 280+ consortium members | $10M initial | Standards, national security testing | OpenAI, Anthropic (MOUs signed Aug 2024) |
| Japan AISI↗🔗 webLaunch of Japan AI Safety Institute (AISI) - METI Press ReleaseOfficial government press release announcing Japan's entry into the network of national AI Safety Institutes; relevant for tracking international AI governance coordination and the expansion of the global AISI ecosystem.Japan's Ministry of Economy, Trade and Industry (METI) announced the launch of the AI Safety Institute (AISI) on February 14, 2024, housed within the Information-technology Prom...ai-safetygovernancepolicyevaluation+2Source ↗ | Feb 2024 | Cross-agency structure | Undisclosed | Evaluation methodology | Coordination with NIST |
| Singapore | Planned 2024 | 30-50 (target) | $25M (projected) | Southeast Asia coordination | Under negotiation |
| EU/France/Germany | In development | 50-100 (target) | €50M (projected) | EU-wide coordination | Under negotiation |
United Kingdom AI Safety Institute
The most developed AISI globally, with 100+ staff and pre-deployment access to major frontier models. See UK AI Safety Institute for full details.
United States AI Safety Institute
NIST-based institute with 280+ consortium members and MOUs with OpenAI and Anthropic. See US AI Safety Institute for full details.
International Network Development
Beyond the UK and US institutes, the AISI model is spreading internationally. Japan established its AI Safety Institute↗🔗 webLaunch of Japan AI Safety Institute (AISI) - METI Press ReleaseOfficial government press release announcing Japan's entry into the network of national AI Safety Institutes; relevant for tracking international AI governance coordination and the expansion of the global AISI ecosystem.Japan's Ministry of Economy, Trade and Industry (METI) announced the launch of the AI Safety Institute (AISI) on February 14, 2024, housed within the Information-technology Prom...ai-safetygovernancepolicyevaluation+2Source ↗ in February 2024 as a cross-government effort involving the Cabinet Office, Ministry of Economy Trade and Industry, and multiple research institutions, with Director Akiko Murakami leading evaluation methodology development. Singapore announced plans for its own institute to serve as a hub for AI development in Southeast Asia.
At the May 2024 Seoul AI Safety Summit↗🏛️ government★★★★☆UK GovernmentSeoul AI Safety SummitThis is the official UK government hub for the Seoul AI Safety Summit 2024, a major intergovernmental milestone in building international AI safety governance infrastructure, relevant for tracking the evolution of global AI policy coordination.The AI Seoul Summit 2024, co-hosted by the UK and Republic of Korea in May 2024, advanced global AI safety governance by securing international agreements on risk assessment fra...ai-safetygovernancepolicycoordination+4Source ↗, world leaders from Australia, Canada, the EU, France, Germany, Italy, Japan, Korea, Singapore, the UK, and the US signed the Seoul Statement of Intent↗🏛️ government★★★★☆UK GovernmentSeoul Statement of Intent toward International Cooperation on AI Safety ScienceThis is a key multilateral government document establishing the political and institutional framework for international AI safety science cooperation, relevant to understanding how national AI Safety Institutes are being networked globally.The Seoul Statement of Intent, signed by 11 countries and the EU at the May 2024 AI Seoul Summit, formalizes multilateral commitment to coordinated AI safety science cooperation...ai-safetygovernancepolicycoordination+3Source ↗, establishing the International Network of AI Safety Institutes. U.S. Secretary of Commerce Gina Raimondo formally launched the network, which aims to "accelerate the advancement of the science of AI safety" through coordinated research, resource sharing, and codeveloping AI model evaluations.
The network held its first in-person meeting↗🔗 web★★★★☆European Unionfirst meeting of the International NetworkThis European Commission news item documents an early milestone in international AI safety governance infrastructure, relevant to those tracking how governments are coordinating on frontier AI risk evaluation and oversight.This page covers the inaugural meeting of the International Network of AI Safety Institutes, a multilateral initiative bringing together national AI safety bodies to coordinate ...governancepolicycoordinationai-safety+4Source ↗ on November 20-21, 2024 in San Francisco, bringing together technical AI experts from nine countries and the European Union. Participating institutes agreed to pursue complementarity and interoperability, develop best practices, and exchange evaluation methodologies.
However, international coordination faces significant challenges. Different countries have varying national security concerns, regulatory approaches, and relationships with AI labs. The CSIS analysis↗🔗 web★★★★☆CSISThe AI Safety Institute International Network: Next StepsPublished by CSIS, this policy analysis is relevant for understanding international efforts to institutionalize AI safety governance through coordinated national safety institutes, particularly following the UK Bletchley Declaration.This CSIS analysis examines the international network of AI Safety Institutes established across multiple countries and provides recommendations for strengthening their coordina...ai-safetygovernancepolicycoordination+3Source ↗ notes that the network "remains heavily weighted toward higher-income countries in the West, limiting its impact." Information sharing is constrained by classification requirements and competitive concerns, and the effectiveness of coordination depends on sustained political commitment that may be vulnerable to leadership changes (as seen in US rebranding).
Operational Methodology
Evaluation Frameworks
AISIs have developed methodologies for evaluating AI systems across multiple dimensions of safety and capability. The joint UK-US evaluation of Claude 3.5 Sonnet and OpenAI o1 tested models across four domains, providing a template for pre-deployment assessment:
| Evaluation Domain | What It Tests | Key Benchmarks Used | Findings from Joint Evaluations |
|---|---|---|---|
| Biological capabilities | Assistance with pathogen design, synthesis planning | Custom biosecurity scenarios | Models compared against reference baselines |
| Cyber capabilities | Offensive security assistance, vulnerability exploitation | HarmBench↗🔗 webHarmBench: A Standardized Evaluation Framework for Automated Red TeamingHarmBench is a key evaluation resource for AI safety researchers studying adversarial robustness and red teaming; it standardizes how attack methods are compared across LLMs, making it relevant for both offensive and defensive safety work.HarmBench is a standardized benchmark for evaluating automated red teaming methods against large language models. It provides a comprehensive framework for assessing how well va...red-teamingevaluationtechnical-safetyai-safety+3Source ↗ framework | Tested autonomous operation in security contexts |
| Software/AI development | Autonomous coding, recursive improvement potential | Agentic coding tasks | Assessed scaffolding and tool use capabilities |
| Safeguard efficacy | Jailbreak resistance, refusal consistency | Red-teaming with diverse prompts | Measured safeguard robustness across attack vectors |
The 2024 FLI AI Safety Index↗🔗 web★★★☆☆Future of Life InstituteFuture of Life Institute: AI Safety Index 2024A high-profile civil society audit of leading AI labs' safety practices, useful for understanding how external organizations assess and compare industry safety commitments; complements internal lab safety cards and government evaluations.The Future of Life Institute's AI Safety Index 2024 systematically evaluates six leading AI companies—including OpenAI, Google DeepMind, Anthropic, Meta, xAI, and Mistral—across...ai-safetyevaluationgovernancepolicy+4Source ↗ convened seven independent experts to evaluate six leading AI companies. The review found that "although there is a lot of activity at AI companies that goes under the heading of 'safety,' it is not yet very effective." Anthropic received recognition for allowing third-party pre-deployment evaluations by the UK and US AI Safety Institutes, setting a benchmark for industry best practices.
Key benchmarks developed for dangerous capability assessment include the Weapons of Mass Destruction Proxy Benchmark (WMDP)↗🔗 webWeapons of Mass Destruction Proxy Benchmark (WMDP)WMDP is a key benchmark for assessing WMD-related hazardous capabilities in LLMs, relevant to AI safety evaluations conducted by labs and regulators to gate model deployment decisions.WMDP is a benchmark designed to measure and evaluate hazardous knowledge in large language models related to biosecurity, chemical, nuclear, and radiological weapons. It serves ...evaluationcapabilitiesred-teamingai-safety+4Source ↗, a dataset of 3,668 multiple-choice questions measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security. Stanford's AIR-Bench 2024 provides 5,694 tests spanning 314 granular risk categories aligned with government regulations.
Capability assessment presents particular challenges because it requires evaluators to anticipate potentially novel abilities before they manifest. The FLI analysis↗🔗 web★★★☆☆Future of Life InstituteFuture of Life Institute: AI Safety Index 2024A high-profile civil society audit of leading AI labs' safety practices, useful for understanding how external organizations assess and compare industry safety commitments; complements internal lab safety cards and government evaluations.The Future of Life Institute's AI Safety Index 2024 systematically evaluates six leading AI companies—including OpenAI, Google DeepMind, Anthropic, Meta, xAI, and Mistral—across...ai-safetyevaluationgovernancepolicy+4Source ↗ notes that "naive elicitation strategies cause significant underreporting of risk profiles, potentially missing dangerous capabilities that sophisticated actors could unlock." State-of-the-art elicitation techniques—adapting test-time compute, scaffolding, tools, and fine-tuning—are essential but resource-intensive.
Technical Infrastructure
The development of standardized evaluation tools represents a crucial aspect of AISI work. The UK institute's Inspect framework exemplifies this approach, providing a modular system that supports multiple model APIs, enables reproducible evaluation protocols, facilitates comparison across different models and time periods, and allows community contribution to evaluation development.
These technical infrastructures must balance several competing requirements. They need sufficient sophistication to detect subtle but dangerous capabilities while remaining accessible to researchers without specialized infrastructure. They must provide consistent results across different computing environments while adapting to rapidly evolving model architectures and capabilities.
The open-source approach adopted by several institutes reflects a strategic decision that community development can advance evaluation capabilities faster than any single institution. However, this openness also means that AI labs can optimize their systems against known evaluation methodologies, potentially undermining the validity of assessments.
Access Negotiations
Securing meaningful access to frontier AI systems represents perhaps the most critical and challenging aspect of AISI operations. Labs are understandably reluctant to share proprietary information about their most advanced systems, both for competitive reasons and because such information could enable competitors or malicious actors to develop similar capabilities.
Successful access negotiations typically involve careful balance of several factors: providing labs with valuable feedback or evaluation services in exchange for access, establishing clear confidentiality protocols that protect proprietary information, demonstrating technical competence and responsible handling of sensitive information, and maintaining relationships that incentivize continued cooperation rather than treating labs as adversaries.
The voluntary nature of current access agreements represents both an opportunity and a fundamental limitation. Labs cooperate because they perceive value in independent evaluation or because they want to maintain positive relationships with government institutions. However, this voluntary approach means that access could be withdrawn if labs conclude that cooperation is no longer in their interest.
Critical Limitations and Challenges
The Independence Dilemma
AISIs face an inherent tension between the need for industry cooperation and the requirement for independent oversight. A 2025 analysis in AI & Society↗📄 paper★★★★☆Springer (peer-reviewed)analysis in AI & SocietyA 2025 open-access paper in AI & Society relevant to anyone designing or evaluating AI safety policy, highlighting how safety regulations can be co-opted by powerful incumbents rather than serving public interest.This paper argues that AI safety regulation is particularly vulnerable to regulatory capture, where powerful incumbents exploit safety rules for economic or political advantage....ai-safetygovernancepolicydeployment+2Source ↗ warns that "the field of AI safety is extremely vulnerable to regulatory capture" and that "those who advocate for regulation as a response to AI risks may be inadvertently playing into the hands of the dominant firms in the industry."
The TechPolicy.Press analysis↗🔗 web★★★☆☆TechPolicy.PressTechPolicy.Press analysisUseful policy-focused analysis for those tracking how national AI safety institutions are being shaped by political and economic pressures, relevant to understanding the institutional landscape for AI governance and safety evaluation globally.This TechPolicy.Press analysis examines the evolving role of AI Safety Institutes (AISIs) across multiple countries, exploring how they balance safety evaluation mandates with b...ai-safetygovernancepolicyevaluation+3Source ↗ notes a major set of concerns "has to do with their relationship to industry, particularly around fears that close ties with companies might lead to 'regulatory capture,' undermining the impartiality and independence of these institutes." This is particularly challenging because AISIs need good relationships with AI companies to access and evaluate models in the first place.
Industry influence can manifest through several channels:
| Capture Mechanism | How It Operates | Observed Examples |
|---|---|---|
| Hiring patterns | Staff recruited from labs bring industry perspectives | UK/US AISI leadership includes former lab employees |
| Access dependencies | Voluntary model access creates incentive to avoid critical findings | All major access agreements remain voluntary |
| Funding relationships | Resource-sharing arrangements create dependencies | UK AISI receives compute access from industry partners |
| Framing adoption | Institutes adopt industry definitions of "safety" | Focus on capability evaluation vs. broader harms |
| Revolving door | Staff may return to industry after government service | Career incentives favor positive industry relations |
The OECD analysis↗🔗 web★★★★☆OECDAISI International NetworkRelevant for understanding institutional efforts to coordinate AI safety governance internationally; the AISI Network is a direct outgrowth of the 2023 Bletchley Declaration and connects bodies like the UK AISI, US AISI (AISI at NIST), and counterparts in other nations.The AISI International Network, launched in May 2024, is a multilateral initiative connecting national AI Safety Institutes to coordinate on safe and trustworthy AI development....governanceai-safetypolicycoordination+3Source ↗ recommends that the AISI Network "preserve its independent integrity by operating as a community of technical experts rather than regulators." However, this advisory positioning may limit impact when enforcement is needed.
Authority and Enforcement Gaps
Most existing AISIs operate in advisory roles without direct enforcement authority. They can evaluate AI systems and publish findings, but they cannot compel labs to provide access, delay deployments pending evaluation, or enforce remediation of identified safety issues. This limitation fundamentally constrains their potential impact on AI development trajectories.
The advisory model has several advantages: it allows AISIs to build relationships and credibility before seeking expanded authority, it avoids regulatory capture concerns that might arise with enforcement powers, it enables international coordination without requiring harmonized legal frameworks, and it provides flexibility to adapt approaches as the technology and risk landscape evolves.
However, advisory authority may prove inadequate as AI capabilities advance. If AISIs identify serious safety concerns but cannot compel action, their evaluations become merely informational rather than protective. Labs facing competitive pressure may ignore advisory recommendations, particularly if compliance would significantly delay deployment or increase costs relative to competitors.
The path from advisory to regulatory authority faces significant challenges. Expanding AISI powers requires legislative action in most jurisdictions, which involves complex political processes and industry lobbying. Different countries may develop incompatible regulatory approaches, fragmenting the international coordination that makes AISIs potentially valuable. Most fundamentally, effective enforcement requires technical standards and evaluation methodologies that remain under development.
Scale and Resource Constraints
The resource mismatch between AISIs and the AI labs they oversee represents a fundamental challenge to effective evaluation. Leading AI labs employ thousands of researchers and engineers and spend billions of dollars annually on AI development. Even the largest planned AISIs will have hundreds of staff members and budgets measured in tens or hundreds of millions.
This scale disparity manifests in several ways that limit AISI effectiveness. AISIs cannot match lab investment in evaluation infrastructure, potentially missing sophisticated safety issues that require extensive computational resources to detect. They must rely on lab cooperation for access to training data, model architectures, and internal evaluations, rather than independently verifying such information. They lack the personnel to comprehensively evaluate the full range of capabilities that emerge from large-scale training, potentially missing important but rare abilities.
Perhaps most critically, AISIs may always be evaluating last generation's technology while labs deploy current generation systems. If evaluation cycles take months while development cycles take weeks, AISI findings become historically interesting but strategically irrelevant. This timing mismatch could worsen as AI development accelerates and evaluation methodologies become more sophisticated and time-consuming.
Addressing scale limitations may require fundamental changes to the current model. Potential approaches include mandatory disclosure requirements that shift evaluation burden to labs, international cost-sharing that pools resources across multiple institutes, public-private partnerships that leverage industry evaluation infrastructure, or regulatory approaches that slow deployment timelines to match evaluation capabilities.
Methodological Uncertainties
AI evaluation faces profound technical challenges that limit the reliability and relevance of current methodologies. The problem of unknown capabilities—abilities that emerge unexpectedly from large-scale training—means that evaluations may miss the most important and dangerous capabilities. Current evaluation approaches focus on testing known capability categories, but transformative AI systems may develop qualitatively new abilities that existing frameworks cannot detect.
Evaluation validity represents another fundamental challenge. Laboratory testing may not predict real-world behavior, particularly for systems that adapt their responses based on context or user interactions. Safety properties demonstrated during evaluation may not persist across different deployment scenarios, user populations, or adversarial contexts.
The arms race dynamic between evaluation and optimization presents an ongoing challenge. As evaluation methodologies become public, AI developers can optimize their systems to perform well on known benchmarks while potentially retaining concerning capabilities that evaluations do not detect. This gaming dynamic may require continuous evolution of evaluation approaches, increasing the complexity and resource requirements for effective assessment.
Temporal dynamics add another layer of complexity. AI systems may exhibit different behavior over time as they learn from deployment interactions, receive updates, or face novel situations not represented in evaluation datasets. Current evaluation methodologies primarily assess snapshot behavior rather than evolution over time, potentially missing important safety-relevant changes.
Trajectory and Future Evolution
Near-Term Development (2025-2026)
The next two years will likely see continued rapid expansion of existing AISIs and establishment of new institutes across additional countries. The UK and US institutes are expected to reach their target staffing levels and develop more sophisticated evaluation capabilities. International coordination mechanisms established at recent AI safety summits will mature into operational frameworks for information sharing and joint evaluation activities.
Several technical developments will shape AISI effectiveness during this period. Evaluation methodologies will become more standardized, enabling better comparison across different systems and time periods. Automated evaluation tools may reduce the time required for comprehensive assessment, potentially addressing some timing mismatch concerns. The development of better interpretability techniques could enhance evaluators' ability to understand system behavior and identify concerning capabilities.
However, this period may also reveal fundamental limitations of the current AISI model. As AI capabilities advance more rapidly, the gap between evaluation timelines and deployment decisions may widen. Industry consolidation could reduce the number of actors requiring evaluation while potentially making access negotiations more challenging. Political changes in key countries could disrupt funding, leadership, or international coordination efforts.
The relationship between AISIs and other governance mechanisms will evolve during this period. Integration with broader regulatory frameworks may begin, potentially providing AISIs with expanded authority or enforcement mechanisms. Alternatively, regulatory development may bypass AISIs if they are perceived as ineffective or captured by industry interests.
Medium-Term Scenarios (2026-2029)
The medium-term trajectory for AISIs depends heavily on how several critical uncertainties resolve. In optimistic scenarios, AISIs successfully demonstrate value through high-quality evaluations that inform policy decisions, gain expanded authority through legislative changes that enable enforcement action, maintain independence despite industry relationships, and establish effective international coordination that provides global oversight capacity.
Such successful development could position AISIs as central institutions in AI governance, potentially serving as verification bodies for international AI safety agreements, regulatory agencies with authority to approve or delay AI deployments, coordinating centers for technical standards development, or incident response organizations that investigate AI system failures.
However, pessimistic scenarios are equally plausible. AISIs may prove unable to keep pace with advancing capabilities, making their evaluations strategically irrelevant. Industry capture could transform them into legitimacy-providing institutions that rubber-stamp lab decisions rather than providing independent oversight. International coordination could fragment due to geopolitical tensions or divergent national interests. Political changes could defund or reorganize institutes, disrupting institutional knowledge and relationships.
Hybrid scenarios seem most likely, where AISIs provide valuable but limited contributions to AI governance. They may successfully evaluate current generation systems while struggling with more advanced capabilities. They may maintain partial independence while facing increased industry influence. They may achieve regional coordination while failing to establish global frameworks.
Long-Term Possibilities
The long-term role of AISIs will depend fundamentally on the trajectory of AI capabilities and the broader governance response. If AI development slows or reaches temporary plateaus, AISIs may have time to develop evaluation capabilities that match the systems they oversee. If international cooperation on AI governance strengthens, AISIs could become verification bodies for binding international agreements.
Alternatively, if AI development accelerates toward artificial general intelligence or superintelligence, current AISI models may prove entirely inadequate. The evaluation of systems approaching or exceeding human-level capabilities across multiple domains may require fundamentally different approaches that current institutions cannot provide.
The most transformative possibility involves AISIs evolving beyond their current evaluation focus toward active participation in AI development. Rather than merely assessing systems developed by labs, future iterations might directly fund or conduct safety-focused AI research, potentially developing alternative development approaches that prioritize safety over capability advancement.
Key Uncertainties and Research Priorities
Fundamental Questions
Several critical uncertainties will determine whether AISIs can meaningfully contribute to AI safety. The independence question remains paramount: can government institutions maintain sufficient objectivity to provide effective oversight while maintaining the industry relationships necessary for access and cooperation? Historical precedents from other domains provide mixed guidance, with some regulatory agencies successfully maintaining independence while others became captured by the industries they oversee.
The authority question similarly remains unresolved. Will AISIs gain sufficient regulatory power to influence AI development decisions, or will they remain advisory institutions whose recommendations can be safely ignored? The path from advisory to regulatory authority requires political action that may not materialize, particularly if industry opposition is strong or if other governance mechanisms are perceived as more effective.
The scaling question presents perhaps the most fundamental challenge. Can evaluation capabilities advance fast enough to remain relevant as AI systems become more capable, or will the resource and timeline mismatches prove insurmountable? This question depends partly on technical developments in evaluation methodology and partly on whether regulatory approaches can alter the competitive dynamics driving rapid deployment.
Empirical Research Needs
Several areas require urgent empirical investigation to inform AISI development and evaluation. Studies of regulatory capture in analogous domains could provide insights into institutional design choices that preserve independence. Comparative analysis of different AISI organizational models could identify best practices for balancing cooperation and oversight requirements.
Technical research on evaluation methodology remains critical, particularly around automated evaluation systems that could reduce assessment timelines, interpretability techniques that enable better understanding of system behavior, and methods for detecting unknown capabilities in large-scale AI systems. The development of standardized evaluation frameworks requires careful empirical validation to ensure they actually predict deployment behavior.
International relations research could illuminate the prospects for sustained coordination among AISIs, particularly how geopolitical tensions and competitive dynamics might affect information sharing and joint evaluation efforts. Historical studies of international technical cooperation in other domains could provide relevant insights.
Decision-Relevant Considerations
For individuals considering careers in AISIs, several factors merit careful consideration. The impact potential depends heavily on whether institutes gain meaningful authority and maintain independence. The skill development opportunities include valuable experience in AI evaluation and policy interfaces, though bureaucratic constraints may limit research flexibility.
For policymakers considering AISI funding or expansion, key considerations include whether advisory institutions provide sufficient oversight given the stakes involved, how to design institutional structures that preserve independence while enabling industry cooperation, and whether resources might be more effectively deployed through other governance mechanisms.
For AI safety researchers more broadly, AISIs represent one approach among many potential governance interventions. Their effectiveness relative to technical alignment research, industry engagement, or international treaty development remains an open question that depends partly on one's views about the tractability of technical versus governance approaches to AI safety.
The ultimate assessment of AISIs may depend less on their current capabilities than on their potential for evolution. If they can serve as a foundation for more sophisticated governance institutions, their current limitations may prove temporary. If they become entrenched but ineffective institutions that provide false reassurance about AI oversight, their net impact could be negative. The next several years will likely determine which trajectory proves accurate.
Sources
Official Institute Resources
- UK AI Security Institute↗🏛️ government★★★★☆UK AI Safety InstituteUK AI Safety Institute (AISI)AISI is a key institutional actor in AI safety, representing one of the first government-led efforts to systematically evaluate frontier AI models; its work and publications are directly relevant to governance, evaluation methodology, and international AI safety coordination.The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, deve...ai-safetygovernancepolicyevaluation+5Source ↗ - Official website with research publications and Inspect framework
- US AI Safety Institute at NIST↗🏛️ government★★★★★NISTCenter for AI Standards and Innovation (CAISI)CAISI is the institutional home for NIST's AI safety and standards work, directly relevant to AI governance, evaluation frameworks, and policy efforts; a key U.S. government body for understanding official AI safety infrastructure.CAISI is NIST's dedicated center serving as the U.S. government's primary interface with industry on AI testing, security standards, and evaluation. It develops voluntary AI saf...ai-safetygovernancepolicyevaluation+4Source ↗ - NIST's AI Safety Institute homepage
- Introducing the AI Safety Institute↗🏛️ government★★★★☆UK GovernmentIntroducing the UK AI Safety InstituteThe founding document of the UK AI Safety Institute, a landmark in state-level AI governance, establishing institutional framing and mission priorities that influenced subsequent international AI safety efforts including the Bletchley Declaration.The UK government's foundational document introducing the AI Safety Institute (AISI), the first state-backed organization dedicated to advanced AI safety for the public interest...governancepolicyai-safetyevaluation+4Source ↗ - UK Government overview
- Japan AI Safety Institute Launch↗🔗 webLaunch of Japan AI Safety Institute (AISI) - METI Press ReleaseOfficial government press release announcing Japan's entry into the network of national AI Safety Institutes; relevant for tracking international AI governance coordination and the expansion of the global AISI ecosystem.Japan's Ministry of Economy, Trade and Industry (METI) announced the launch of the AI Safety Institute (AISI) on February 14, 2024, housed within the Information-technology Prom...ai-safetygovernancepolicyevaluation+2Source ↗ - Ministry of Economy, Trade and Industry announcement
Evaluations and Research
- Pre-Deployment Evaluation of Claude 3.5 Sonnet↗🏛️ government★★★★★NISTPre-deployment evaluation of Claude 3.5 SonnetThis is one of the first publicly disclosed government-conducted pre-deployment AI safety evaluations, setting a precedent for how regulatory bodies may assess frontier models before release; relevant to governance, capability evaluation, and red-teaming methodology discussions.The U.S. and UK AI Safety Institutes jointly conducted pre-deployment safety evaluations of Anthropic's upgraded Claude 3.5 Sonnet, testing biological capabilities, cyber capabi...evaluationred-teaminggovernancepolicy+6Source ↗ - First joint UK-US evaluation (November 2024)
- Pre-Deployment Evaluation of OpenAI o1↗🏛️ government★★★★★NISTPre-Deployment Evaluation of OpenAI's o1 ModelThis is a landmark government-led safety evaluation representing one of the first formal pre-deployment assessments of a frontier AI model by national safety institutes, relevant to discussions of AI governance frameworks and capability evaluations.The US and UK AI Safety Institutes conducted a joint pre-deployment evaluation of OpenAI's o1 model, assessing its capabilities and risks across three domains including potentia...evaluationcapabilitiesai-safetygovernance+5Source ↗ - Second joint evaluation (December 2024)
- US AISI Signs Agreements with Anthropic and OpenAI↗🏛️ government★★★★★NISTMOU with US AI Safety InstituteA landmark 2024 government announcement establishing formal pre-deployment model access and safety evaluation collaboration between the U.S. government and leading frontier AI labs, relevant to AI governance and oversight mechanisms.The U.S. AI Safety Institute (NIST) announced Memoranda of Understanding with Anthropic and OpenAI in August 2024, establishing formal frameworks for pre- and post-deployment ac...ai-safetygovernancepolicyevaluation+4Source ↗ - August 2024 MOUs
- Inspect Evaluation Framework↗🔗 webUK AI Safety Institute's Inspect frameworkInspect is a practical evaluation toolkit from the UK government's AI Safety Institute, relevant to researchers building safety benchmarks or conducting model evaluations; note that current tags like 'interpretability' and 'rlhf' appear mismatched to this resource's actual focus on evaluation infrastructure.Inspect is an open-source framework developed by the UK AI Safety Institute (AISI) for evaluating large language models and AI systems. It provides standardized tools for runnin...ai-safetyevaluationtechnical-safetyred-teaming+4Source ↗ - Open-source AI evaluation tool
- FLI AI Safety Index 2024↗🔗 web★★★☆☆Future of Life InstituteFuture of Life Institute: AI Safety Index 2024A high-profile civil society audit of leading AI labs' safety practices, useful for understanding how external organizations assess and compare industry safety commitments; complements internal lab safety cards and government evaluations.The Future of Life Institute's AI Safety Index 2024 systematically evaluates six leading AI companies—including OpenAI, Google DeepMind, Anthropic, Meta, xAI, and Mistral—across...ai-safetyevaluationgovernancepolicy+4Source ↗ - Independent assessment of AI company safety practices
International Coordination
- AI Seoul Summit 2024↗🏛️ government★★★★☆UK GovernmentSeoul AI Safety SummitThis is the official UK government hub for the Seoul AI Safety Summit 2024, a major intergovernmental milestone in building international AI safety governance infrastructure, relevant for tracking the evolution of global AI policy coordination.The AI Seoul Summit 2024, co-hosted by the UK and Republic of Korea in May 2024, advanced global AI safety governance by securing international agreements on risk assessment fra...ai-safetygovernancepolicycoordination+4Source ↗ - UK Government summit page
- Seoul Statement of Intent↗🏛️ government★★★★☆UK GovernmentSeoul Statement of Intent toward International Cooperation on AI Safety ScienceThis is a key multilateral government document establishing the political and institutional framework for international AI safety science cooperation, relevant to understanding how national AI Safety Institutes are being networked globally.The Seoul Statement of Intent, signed by 11 countries and the EU at the May 2024 AI Seoul Summit, formalizes multilateral commitment to coordinated AI safety science cooperation...ai-safetygovernancepolicycoordination+3Source ↗ - International network founding document
- First Meeting of the International Network↗🔗 web★★★★☆European Unionfirst meeting of the International NetworkThis European Commission news item documents an early milestone in international AI safety governance infrastructure, relevant to those tracking how governments are coordinating on frontier AI risk evaluation and oversight.This page covers the inaugural meeting of the International Network of AI Safety Institutes, a multilateral initiative bringing together national AI safety bodies to coordinate ...governancepolicycoordinationai-safety+4Source ↗ - EU report on San Francisco meeting (November 2024)
- CSIS: AI Safety Institute Network Recommendations↗🔗 web★★★★☆CSISThe AI Safety Institute International Network: Next StepsPublished by CSIS, this policy analysis is relevant for understanding international efforts to institutionalize AI safety governance through coordinated national safety institutes, particularly following the UK Bletchley Declaration.This CSIS analysis examines the international network of AI Safety Institutes established across multiple countries and provides recommendations for strengthening their coordina...ai-safetygovernancepolicycoordination+3Source ↗ - Policy analysis
Analysis and Commentary
- Elizabeth Kelly: TIME 100 Most Influential in AI↗🔗 web★★★☆☆TIMEElizabeth Kelly: Leading America's AI Safety Institute (TIME 100 Most Influential in AI 2024)Profile of the director of the U.S. AI Safety Institute, relevant for understanding the institutional landscape of AI safety governance and federal efforts to evaluate and manage risks from advanced AI systems.A TIME profile of Elizabeth Kelly, who leads the U.S. AI Safety Institute (AISI) at NIST, highlighting her role in shaping federal AI safety policy and evaluation frameworks. Th...governancepolicyai-safetyevaluation+2Source ↗ - Profile of US AISI director
- CSIS: US Vision for AI Safety↗🔗 web★★★★☆CSISCSIS: US Vision for AI SafetyThis CSIS interview provides insight into the US government's institutional approach to AI safety through AISI, relevant for understanding how federal policy is translating AI safety concerns into concrete regulatory and evaluation frameworks as of 2024.A CSIS interview with Elizabeth Kelly, Director of the US AI Safety Institute (USAISI), discussing the US government's strategic approach to AI safety, the role of AISI in evalu...ai-safetygovernancepolicyevaluation+4Source ↗ - Conversation with Elizabeth Kelly
- TechPolicy.Press: How AISIs Inform Governance↗🔗 web★★★☆☆TechPolicy.PressTechPolicy.Press analysisUseful policy-focused analysis for those tracking how national AI safety institutions are being shaped by political and economic pressures, relevant to understanding the institutional landscape for AI governance and safety evaluation globally.This TechPolicy.Press analysis examines the evolving role of AI Safety Institutes (AISIs) across multiple countries, exploring how they balance safety evaluation mandates with b...ai-safetygovernancepolicyevaluation+3Source ↗ - Analysis of independence concerns
- OECD: AI Safety Institutes Challenge↗🔗 web★★★★☆OECDOECD: AI Safety Institutes ChallengePublished by OECD in July 2024, this piece provides a comparative overview of national AI Safety Institutes and the systemic challenges they face, useful for understanding the international policy landscape around AI safety governance.This OECD analysis examines the emerging landscape of national AI Safety Institutes (AISIs) established by the US, UK, Japan, Canada, Singapore, and EU, assessing their roles in...ai-safetygovernancepolicyevaluation+4Source ↗ - Assessment of institutional capacity
- AI & Society: AI Safety and Regulatory Capture↗📄 paper★★★★☆Springer (peer-reviewed)analysis in AI & SocietyA 2025 open-access paper in AI & Society relevant to anyone designing or evaluating AI safety policy, highlighting how safety regulations can be co-opted by powerful incumbents rather than serving public interest.This paper argues that AI safety regulation is particularly vulnerable to regulatory capture, where powerful incumbents exploit safety rules for economic or political advantage....ai-safetygovernancepolicydeployment+2Source ↗ - Academic analysis of capture risks
2025 Developments
- UK Renames AI Safety Institute↗🔗 webannounced at the Munich Security ConferenceMarks a notable reorientation of UK AI safety policy away from broad societal harms toward national security and criminal misuse, signaling a shift in government priorities post-Paris AI Action Summit.The UK government announced at the Munich Security Conference the renaming of its AI Safety Institute to the AI Security Institute, shifting focus toward serious national securi...governancepolicyai-safetycybersecurity+4Source ↗ - February 2025 rebrand to AI Security Institute
- Trump Administration Rebrands US AI Safety Institute↗🔗 webTrump administration rebrands AI Safety InstituteRelevant to tracking how U.S. federal AI governance institutions are evolving under the Trump administration, particularly the reframing of AI safety oversight in pro-innovation terms.The Trump administration renamed the AI Safety Institute (AISI) to the Center for AI Standards and Innovation (CAISI), signaling a rhetorical shift away from 'safety' toward rap...governancepolicyai-safetyevaluation+2Source ↗ - June 2025 change to CAISI
- AI Safety Advocates Slam NIST Targeting↗🔗 web★★★☆☆Fortuneplanned layoffs affecting NIST staffRelevant to understanding the political and institutional threats to U.S. AI safety governance infrastructure, particularly the fate of NIST's AI Safety Institute under the Trump administration's federal workforce reduction agenda.Reports on planned layoffs at NIST resulting from the Trump administration's DOGE-driven federal workforce reductions, with specific concerns about impacts on the AI Safety Inst...ai-safetygovernancepolicyevaluation+2Source ↗ - Criticism of proposed staff cuts
- TRAINS Taskforce Established↗🏛️ government★★★★☆US Department of CommerceTesting Risks of AI for National Security (TRAINS) TaskforceOfficial U.S. government press release announcing a significant inter-agency AI safety initiative; relevant for tracking U.S. national security AI governance and the institutionalization of AI red-teaming and evaluation efforts.The U.S. AI Safety Institute at NIST announced the formation of the TRAINS Taskforce in November 2024, a multi-agency collaboration including Defense, Energy, Homeland Security,...ai-safetygovernancepolicyevaluation+6Source ↗ - National security testing initiative
References
This CSIS analysis examines the international network of AI Safety Institutes established across multiple countries and provides recommendations for strengthening their coordination, scope, and effectiveness. It addresses how these institutes can better collaborate on technical safety evaluations and policy alignment to address frontier AI risks.
A TIME profile of Elizabeth Kelly, who leads the U.S. AI Safety Institute (AISI) at NIST, highlighting her role in shaping federal AI safety policy and evaluation frameworks. The article likely covers her background, priorities, and vision for responsible AI development and government oversight.
The AI Seoul Summit 2024, co-hosted by the UK and Republic of Korea in May 2024, advanced global AI safety governance by securing international agreements on risk assessment frameworks, launching the first international network of AI Safety Institutes, and obtaining safety commitments from 16 major AI companies worldwide. It built on the Bletchley Park AI Safety Summit of November 2023 as part of an ongoing international diplomatic process.
The Trump administration renamed the AI Safety Institute (AISI) to the Center for AI Standards and Innovation (CAISI), signaling a rhetorical shift away from 'safety' toward rapid AI development. Despite the rebranding, the core functions remain largely the same: evaluating AI capabilities and vulnerabilities, developing voluntary standards, and serving as industry's primary government contact. Commerce Secretary Howard Lutnick framed the change as reducing regulatory overreach while maintaining national security standards.
Japan's Ministry of Economy, Trade and Industry (METI) announced the launch of the AI Safety Institute (AISI) on February 14, 2024, housed within the Information-technology Promotion Agency (IPA). The institute is tasked with researching AI safety evaluation criteria and methods, and will collaborate internationally with AI Safety Institutes in the US and UK. Multiple ministries and agencies are involved in a coordinated whole-of-government approach.
The U.S. AI Safety Institute at NIST announced the formation of the TRAINS Taskforce in November 2024, a multi-agency collaboration including Defense, Energy, Homeland Security, NSA, and NIH to coordinate research, testing, and red-teaming of advanced AI models across national security domains. The taskforce focuses on risks in areas such as nuclear security, cybersecurity, critical infrastructure, and military capabilities, aiming to develop new AI evaluation benchmarks and conduct joint risk assessments.
The UK government announced at the Munich Security Conference the renaming of its AI Safety Institute to the AI Security Institute, shifting focus toward serious national security risks such as bioweapons, cyberattacks, and criminal misuse of AI. The rebranding is accompanied by a new partnership with Anthropic brokered by the UK's Sovereign AI unit, and a new criminal misuse team co-located with the Home Office.
The UK government's foundational document introducing the AI Safety Institute (AISI), the first state-backed organization dedicated to advanced AI safety for the public interest. It outlines AISI's mission to minimize surprise from rapid AI advances and develop sociotechnical infrastructure to understand and mitigate AI risks, presented to Parliament in November 2023 by Ian Hogarth.
This OECD analysis examines the emerging landscape of national AI Safety Institutes (AISIs) established by the US, UK, Japan, Canada, Singapore, and EU, assessing their roles in evaluating AI capabilities and risks. It identifies key challenges these bodies face, including surveying AI unpredictability, establishing evaluation standards, conducting safety research, and coordinating internationally. The piece argues that while AISIs represent a significant step toward coordinated global AI safety governance, substantial structural and resource challenges remain.
The U.S. AI Safety Institute (NIST) announced Memoranda of Understanding with Anthropic and OpenAI in August 2024, establishing formal frameworks for pre- and post-deployment access to major AI models. These agreements enable collaborative research on capability evaluations, safety risk assessment, and mitigation methods, representing the first formal government-industry partnerships of this kind in the U.S.
This paper argues that AI safety regulation is particularly vulnerable to regulatory capture, where powerful incumbents exploit safety rules for economic or political advantage. It details the specific harms and injustices that captured AI safety regulations could produce, and critically reviews existing proposals to mitigate this risk, cautioning that well-intentioned safety frameworks may be weaponized by dominant industry players.
On February 7, 2024, Commerce Secretary Gina Raimondo announced the founding leadership of the U.S. AI Safety Institute (AISI) at NIST, naming Elizabeth Kelly as Director and Elham Tabassi as Chief Technology Officer. The Institute was created under President Biden's Executive Order on AI to mitigate AI-related risks while supporting American innovation. This press release marks the formal operational launch of the primary U.S. government body responsible for AI safety research and standards.
HarmBench is a standardized benchmark for evaluating automated red teaming methods against large language models. It provides a comprehensive framework for assessing how well various attack methods can elicit harmful behaviors from LLMs, enabling systematic comparison of both attack and defense techniques.
CAISI is NIST's dedicated center serving as the U.S. government's primary interface with industry on AI testing, security standards, and evaluation. It develops voluntary AI safety and security guidelines, conducts evaluations of AI capabilities posing national security risks (including cybersecurity and biosecurity threats), and represents U.S. interests in international AI standardization efforts.
A CSIS interview with Elizabeth Kelly, Director of the US AI Safety Institute (USAISI), discussing the US government's strategic approach to AI safety, the role of AISI in evaluating frontier AI models, and international coordination on safety standards. Kelly outlines the institute's priorities including developing evaluation frameworks, engaging with AI developers, and building global partnerships to manage AI risks.
The AISI International Network, launched in May 2024, is a multilateral initiative connecting national AI Safety Institutes to coordinate on safe and trustworthy AI development. It facilitates knowledge sharing, joint evaluations, and harmonized governance approaches across member countries. The network represents a key institutional mechanism for translating AI safety research into coordinated international policy.
17Seoul Statement of Intent toward International Cooperation on AI Safety ScienceUK Government·Government▸
The Seoul Statement of Intent, signed by 11 countries and the EU at the May 2024 AI Seoul Summit, formalizes multilateral commitment to coordinated AI safety science cooperation. It builds on the Bletchley Park Summit by pledging to leverage national AI Safety Institutes, share scientific assessments, and develop interoperable technical methodologies for AI risk evaluation.
The U.S. and UK AI Safety Institutes jointly conducted pre-deployment safety evaluations of Anthropic's upgraded Claude 3.5 Sonnet, testing biological capabilities, cyber capabilities, software/AI development, and safeguard efficacy. The evaluation used question answering, agent tasks, qualitative probing, and red teaming to benchmark the model against prior versions and competitors. This represents one of the first formal government-led pre-deployment AI safety evaluations made public.
The US and UK AI Safety Institutes conducted a joint pre-deployment evaluation of OpenAI's o1 model, assessing its capabilities and risks across three domains including potential for misuse. The evaluation compared o1's performance to reference models and represents an early example of government-led frontier AI safety testing prior to public release.
WMDP is a benchmark designed to measure and evaluate hazardous knowledge in large language models related to biosecurity, chemical, nuclear, and radiological weapons. It serves as a proxy for assessing dangerous capabilities in AI systems and supports unlearning research aimed at reducing such risks. The benchmark helps researchers identify and mitigate the potential for LLMs to assist in weapons development.
This page covers the inaugural meeting of the International Network of AI Safety Institutes, a multilateral initiative bringing together national AI safety bodies to coordinate on evaluation methodologies, information sharing, and global AI safety governance. The network represents a significant step toward international coordination on frontier AI risk assessment.
Reports on planned layoffs at NIST resulting from the Trump administration's DOGE-driven federal workforce reductions, with specific concerns about impacts on the AI Safety Institute (AISI). The cuts raise alarm among AI safety researchers and policymakers about the dismantling of U.S. government infrastructure dedicated to evaluating and mitigating AI risks.
This TechPolicy.Press analysis examines the evolving role of AI Safety Institutes (AISIs) across multiple countries, exploring how they balance safety evaluation mandates with broader innovation and governance objectives. It assesses how these institutes inform national and international AI policy frameworks and their potential influence on global AI governance norms.
The Future of Life Institute's AI Safety Index 2024 systematically evaluates six leading AI companies—including OpenAI, Google DeepMind, Anthropic, Meta, xAI, and Mistral—across 42 safety indicators spanning risk management, transparency, governance, and preparedness for advanced AI threats. The index finds widespread deficiencies in safety practices and provides letter-grade assessments to benchmark industry progress. It serves as a comparative accountability tool aimed at pressuring companies toward stronger safety commitments.
Inspect is an open-source framework developed by the UK AI Safety Institute (AISI) for evaluating large language models and AI systems. It provides standardized tools for running safety evaluations, benchmarks, and red-teaming tasks. The framework enables researchers and developers to assess AI model capabilities and safety properties in a reproducible and extensible way.
The UK AI Safety Institute (AISI) is the UK government's dedicated body for evaluating and mitigating risks from advanced AI systems. It conducts technical safety research, develops evaluation frameworks for frontier AI models, and works with international partners to inform global AI governance and policy.