Back
Adam Gleave - AI2050
webai2050.schmidtsciences.org·ai2050.schmidtsciences.org/fellow/adam-gleave
This is a fellowship profile page for Adam Gleave, a notable AI safety researcher; for deeper engagement with his work, his publications and METR (formerly ARC Evals) are more substantive resources.
Metadata
Importance: 35/100homepage
Summary
Profile of Adam Gleave as an AI2050 Fellow funded by Schmidt Sciences, highlighting his research focus on AI safety and alignment. Gleave is known for his work on reward modeling, adversarial policies, and evaluation of AI systems. This page serves as a brief professional overview within the AI2050 fellowship program.
Key Points
- •Adam Gleave is an AI2050 Fellow supported by Schmidt Sciences' initiative to fund long-term AI research.
- •His research focuses on AI alignment, reward modeling, and understanding failure modes in reinforcement learning systems.
- •He is the founder of ARC Evals (now METR), an organization focused on evaluating dangerous AI capabilities.
- •His work on adversarial policies demonstrated that RL agents can be exploited through unexpected perturbations.
- •The AI2050 fellowship supports researchers working on problems with implications for the next few decades of AI development.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20262 KB
Fellows Community  Back  Affiliation Co-founder & CEO, FAR.AI Hard Problem Assurance # Adam Gleave 2025 Early Career Fellow Adam Gleave is the co-founder and CEO of FAR.AI, an AI safety research institute working to ensure advanced AI is safe and beneficial for everyone. Adam’s research focuses on securing advanced AI systems. Outside of FAR.AI, Adam is a board member of the Safe AI Forum (SAIF), Model Evaluation and Threat Research (METR), and the London Initiative for Safe AI (LISA), and an advisor for Timaeus and the AI Risk Mitigation Fund. Prior to founding FAR.AI, Adam received his PhD from UC Berkeley under the supervision of Stuart Russell, and previously worked at Google DeepMind with Jan Leike and Geoffrey Irving and several quantitative trading firms. **AI2050 Project** Gleave’s project develops techniques to detect and eliminate hidden behaviors in advanced AI systems. Just as security researchers find and fix vulnerabilities in software, they’re creating methods to audit AI models for concealed objectives that could lead to harmful actions. Through a “red-team/blue-team” approach, they’ll first create models with sophisticated hidden behaviors, then develop tools to identify and remove them. This work addresses risks from both malicious actors inserting backdoors and unintentional AI misalignment. The resulting methods will help ensure that increasingly powerful AI systems remain transparent and trustworthy, allowing society to benefit from AI advances while managing potential risks. Affiliation Co-founder & CEO, FAR.AI Hard Problem Assurance
Resource ID:
a8b645a52178a332 | Stable ID: ZmQ0ZDU3Mz