Longterm Wiki

Expert Positions2 topics

Topic	View	Estimate	Confidence	Date	Source	Source check
Current Approaches Scale	Uncertain	45%	medium	Jul 2023	OpenAI Superalignment (2023)
Will We Get Adequate Warning?	Concerned	40%	medium	2023	—

Education

PhD in Machine Learning, Australian National University

From wiki articleRead full article →

Quick Assessment

Dimension	Assessment
Primary Role	VP of Alignment Science at Anthropic (2024–present)
Key Contributions	Co-authored early RLHF research; led the Agent Alignment Team at Google DeepMind; co-led OpenAI's Superalignment team; developed Reward Modeling frameworks
Key Publications	"Deep Reinforcement Learning from Human Preferences" (NeurIPS 2017); "Scalable agent alignment via reward modeling" (arXiv 2018); "AI Safety Gridworlds" (arXiv 2017); "Recursively Summarizing Books with Human Feedback" (arXiv 2021)
Career Trajectory	PhD, Australian National University (2016) → FHI postdoc (2016) → Senior Research Scientist, Google DeepMind (2016–2021) → Head of Alignment / Superalignment co-lead, OpenAI (January 2021 – May 2024) → Anthropic (2024–present)
Notable Event	Departed OpenAI on May 16, 2024; posted publicly on X about his stated reasons for leaving

Overview

Jan Leike is an AI alignment researcher who has held senior roles at Google DeepMind, OpenAI, and Anthropic. He completed a PhD in reinforcement learning theory at Australian National University in 2016 under the supervision of Marcus Hutter, and subsequently held a brief research fellowship at the Future of Humanity Institute. At DeepMind, he led the Agent Alignment Team and contributed to early RLHF research. He joined OpenAI in January 2021 to lead alignment research, and in July 2023 co-led the formation of the Superalignment team alongside Ilya Sutskever, with a stated goal of solving Superintelligence within four years. He departed OpenAI on May 16, 2024, posting a public thread on X explaining his stated reasons for leaving. He subsequently joined Anthropic, where he heads the Alignment Science team. TIME magazine listed him among the 100 most influential people in AI in both 2023 and 2024.

Read full article →