VP of Alignment Science at AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $14B ARR at Series G growing to $19B by March 2026), safety research (Constitutional AI, mechanistic interpretability...Quality: 74/100 (2024–present)
Key Contributions
Co-authored early RLHFResearch AreaRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100 research; led the Agent Alignment Team at Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100; co-led OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100's Superalignment team; developed Reward ModelingApproachReward ModelingReward modeling, the core component of RLHF receiving $100M+/year investment, trains neural networks on human preference comparisons to enable scalable reinforcement learning. The technique is univ...Quality: 55/100 frameworks
Key Publications
"Deep Reinforcement Learning from Human Preferences" (NeurIPS 2017); "Scalable agent alignment via reward modeling" (arXiv 2018); "AI Safety Gridworlds" (arXiv 2017); "Recursively Summarizing Books with Human Feedback" (arXiv 2021)
Career Trajectory
PhD, Australian National University (2016) → FHI postdoc (2016) → Senior Research Scientist, Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100 (2016–2021) → Head of Alignment / Superalignment co-lead, OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100 (January 2021 – May 2024) → Anthropic (2024–present)
Notable Event
Departed OpenAI on May 16, 2024; posted publicly on X about his stated reasons for leaving
Overview
Jan LeikePersonJan LeikeBiography of Jan Leike covering his career from Australian National University through DeepMind, OpenAI's Superalignment team, to his current role as VP of Alignment Science at Anthropic. Documents...Quality: 27/100 is an AI alignment researcher who has held senior roles at Google DeepMindOrganizationGoogle DeepMindComprehensive overview of DeepMind's history, achievements (AlphaGo, AlphaFold with 200M+ protein structures), and 2023 merger with Google Brain. Documents racing dynamics with OpenAI and new Front...Quality: 37/100, OpenAIOrganizationOpenAIComprehensive organizational profile of OpenAI documenting evolution from 2015 non-profit to Public Benefit Corporation, with detailed analysis of governance crisis, 2024-2025 ownership restructuri...Quality: 62/100, and AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $14B ARR at Series G growing to $19B by March 2026), safety research (Constitutional AI, mechanistic interpretability...Quality: 74/100. He completed a PhD in reinforcement learning theory at Australian National University in 2016 under the supervision of Marcus Hutter, and subsequently held a brief research fellowship at the Future of Humanity InstituteOrganizationFuture of Humanity InstituteThe Future of Humanity Institute (2005-2024) was a pioneering Oxford research center that founded existential risk studies and AI alignment research, growing from 3 to ~50 researchers and receiving...Quality: 51/100. At DeepMind, he led the Agent Alignment Team and contributed to early RLHF research. He joined OpenAI in January 2021 to lead alignment research, and in July 2023 co-led the formation of the Superalignment team alongside Ilya SutskeverPersonIlya SutskeverBiographical overview of Ilya Sutskever's career trajectory from deep learning researcher (AlexNet, seq2seq, dropout) to co-founding Safe Superintelligence Inc. in 2024 after leaving OpenAI. Docume...Quality: 26/100, with a stated goal of solving SuperintelligenceConceptSuperintelligenceAI systems with cognitive abilities vastly exceeding human intelligenceQuality: 92/100 within four years. He departed OpenAI on May 16, 2024, posting a public thread on X explaining his stated reasons for leaving. He subsequently joined AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials ($380B valuation, $14B ARR at Series G growing to $19B by March 2026), safety research (Constitutional AI, mechanistic interpretability...Quality: 74/100, where he heads the Alignment Science team. TIME magazine listed him among the 100 most influential people in AI in both 2023 and 2024.