Adam Gleave - AI2050

web

ai2050.schmidtsciences.org·ai2050.schmidtsciences.org/fellow/adam-gleave

This is a fellowship profile page for Adam Gleave, a notable AI safety researcher; for deeper engagement with his work, his publications and METR (formerly ARC Evals) are more substantive resources.

Metadata

Importance: 35/100homepage

Summary

Profile of Adam Gleave as an AI2050 Fellow funded by Schmidt Sciences, highlighting his research focus on AI safety and alignment. Gleave is known for his work on reward modeling, adversarial policies, and evaluation of AI systems. This page serves as a brief professional overview within the AI2050 fellowship program.

Key Points

•Adam Gleave is an AI2050 Fellow supported by Schmidt Sciences' initiative to fund long-term AI research.
•His research focuses on AI alignment, reward modeling, and understanding failure modes in reinforcement learning systems.
•He is the founder of ARC Evals (now METR), an organization focused on evaluating dangerous AI capabilities.
•His work on adversarial policies demonstrated that RL agents can be exploited through unexpected perturbations.
•The AI2050 fellowship supports researchers working on problems with implications for the next few decades of AI development.

Cited by 1 page

Page	Type	Quality
FAR AI	Organization	76.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20262 KB

Adam Gleave - AI2050

Fellows Community

Back

Affiliation
Co-founder & CEO, FAR.AI

Hard Problem
Assurance

Adam Gleave

2025 Early Career Fellow

Adam Gleave is the co-founder and CEO of FAR.AI, an AI safety research institute working to ensure advanced AI is safe and beneficial for everyone. Adam’s research focuses on securing advanced AI systems. Outside of FAR.AI, Adam is a board member of the Safe AI Forum (SAIF), Model Evaluation and Threat Research (METR), and the London Initiative for Safe AI (LISA), and an advisor for Timaeus and the AI Risk Mitigation Fund. Prior to founding FAR.AI, Adam received his PhD from UC Berkeley under the supervision of Stuart Russell, and previously worked at Google DeepMind with Jan Leike and Geoffrey Irving and several quantitative trading firms.

AI2050 Project

Gleave’s project develops techniques to detect and eliminate hidden behaviors in advanced AI systems. Just as security researchers find and fix vulnerabilities in software, they’re creating methods to audit AI models for concealed objectives that could lead to harmful actions. Through a &#8220;red-team/blue-team&#8221; approach, they’ll first create models with sophisticated hidden behaviors, then develop tools to identify and remove them. This work addresses risks from both malicious actors inserting backdoors and unintentional AI misalignment. The resulting methods will help ensure that increasingly powerful AI systems remain transparent and trustworthy, allowing society to benefit from AI advances while managing potential risks.

Affiliation
Co-founder & CEO, FAR.AI

Hard Problem
Assurance

Resource ID: a8b645a52178a332 | Stable ID: sid_LSl0EnB7Z4