Quick Assessment
| Dimension | Assessment |
|---|---|
| Primary Role | VP of Alignment Science at Anthropic (2024–present) |
| Key Contributions | Co-authored early RLHF research; led the Agent Alignment Team at Google DeepMind; co-led OpenAI's Superalignment team; developed Reward Modeling frameworks |
| Key Publications | "Deep Reinforcement Learning from Human Preferences" (NeurIPS 2017); "Scalable agent alignment via reward modeling" (arXiv 2018); "AI Safety Gridworlds" (arXiv 2017); "Recursively Summarizing Books with Human Feedback" (arXiv 2021) |
| Career Trajectory | PhD, Australian National University (2016) → FHI postdoc (2016) → Senior Research Scientist, Google DeepMind (2016–2021) → Head of Alignment / Superalignment co-lead, OpenAI (January 2021 – May 2024) → Anthropic (2024–present) |
| Notable Event | Departed OpenAI on May 16, 2024; posted publicly on X about his stated reasons for leaving |
Overview
Jan Leike is an AI alignment researcher who has held senior roles at Google DeepMind, OpenAI, and Anthropic. He completed a PhD in reinforcement learning theory at Australian National University in 2016 under the supervision of Marcus Hutter, and subsequently held a brief research fellowship at the Future of Humanity Institute. At DeepMind, he led the Agent Alignment Team and contributed to early RLHF research. He joined OpenAI in January 2021 to lead alignment research, and in July 2023 co-led the formation of the Superalignment team alongside Ilya Sutskever, with a stated goal of solving Superintelligence within four years. He departed OpenAI on May 16, 2024, posting a public thread on X explaining his stated reasons for leaving. He subsequently joined Anthropic, where he heads the Alignment Science team. TIME magazine listed him among the 100 most influential people in AI in both 2023 and 2024.