Back
Adam Gleave | FAR.AI
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: FAR AI
Author index page for Adam Gleave at FAR.AI; useful for finding his specific papers on adversarial policies and reward modeling rather than as a standalone resource.
Metadata
Importance: 30/100homepage
Summary
Author page for Adam Gleave at FAR.AI (Foundational Research for AI Safety), listing his published research and contributions to AI safety. Gleave is a prominent AI safety researcher known for work on adversarial policies, reward modeling, and scalable oversight.
Key Points
- •Adam Gleave is a key researcher at FAR.AI focused on technical AI safety problems
- •His work spans adversarial robustness, reward learning, and evaluation of AI systems
- •FAR.AI is an independent AI safety research organization producing technical alignment research
- •This page serves as an index to his published papers and blog posts on AI safety topics
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202620 KB
[We updated our website and would love your feedback!](https://www.far.ai/about/website-feedback)
[](https://www.far.ai/)

# Adam Gleave
Co-founder & CEO
FAR.AI
Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by [Stuart Russell](https://people.eecs.berkeley.edu/~russell/). His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his [website](https://gleave.me/).
# NEWs & publications
# NEWs & publications
[**Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution**](https://www.far.ai/news/concept-data-attribution-02-2026)
[February 19, 2026](https://www.far.ai/about/people/adam-gleave#)
[**Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models**](https://www.far.ai/research/prefill-level-jailbreak-a-black-box-risk-analysis-of-large-language-models)
[February 19, 2026](https://www.far.ai/about/people/adam-gleave#)
[**The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes**](https://www.far.ai/research/the-obfuscation-atlas-mapping-where-honesty-emerges-in-rlvr-with-deception-probes)
[February 17, 2026](https://www.far.ai/about/people/adam-gleave#)
[**Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened**](https://www.far.ai/news/revisiting-attempts-to-persuade)
[February 11, 2026](https://www.far.ai/about/people/adam-gleave#)
[**Large language models can effectively convince people to believe conspiracies**](https://www.far.ai/research/large-language-models-can-effectively-convince-people-to-believe-conspiracies)
[January 9, 2026](https://www.far.ai/about/people/adam-gleave#)
[**AI in 2025: Faster Progress, Harder Problems**](https://www.far.ai/news/san-diego-2025-opening-remarks)
[December 16, 2025](https://www.far.ai/about/people/adam-gleave#)
[**Frontier LLMs Attempt to Persuade into Harmful Topics**](https://www.far.ai/news/attempt-to-persuade-eval)
[August 21, 2025](https://www.far.ai/about/people/adam-gleave#)
[**A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs**](https://www.far.ai/news/safety-gap-toolkit)
[July 31, 2025](https://www.far.ai/about/people/adam-gleave#)
[**Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility**](https://www.far.ai/research/jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility)
[July 15, 2025](https://www.far.ai/about/people/adam-gleave#)
[*
... (truncated, 20 KB total)Resource ID:
ca68437469b0fe97 | Stable ID: OTY2ZDE1YW