Back
Victoria Krakovna – Research Publications
webvkrakovna.wordpress.com·vkrakovna.wordpress.com/research/
Victoria Krakovna is a senior DeepMind safety researcher; this page indexes her body of work and is a useful entry point for understanding DeepMind's technical safety research agenda across side effects, power-seeking, and frontier model evaluation.
Metadata
Importance: 72/100homepage
Summary
This is the research publications page of Victoria Krakovna (DeepMind safety researcher), listing her papers spanning side effects avoidance, power-seeking, goal misgeneralization, tampering incentives, dangerous capabilities evaluation, and stealth/situational awareness in frontier models. The page serves as a comprehensive index of her contributions to technical AI safety research from 2017 to 2025.
Key Points
- •Covers foundational work on side effects penalties and relative reachability, including the influential AI Safety Gridworlds benchmark (2017).
- •Includes research on power-seeking tendencies in trained agents and quantifying stability of non-power-seeking behavior.
- •Features papers on tampering incentives (REALab, Decoupled Approval) and goal misgeneralization in RL agents.
- •Recent work (2024-2025) focuses on evaluating frontier models for dangerous capabilities, stealth, and situational/scheming awareness.
- •Several papers are products of MATS (ML Alignment Theory Scholars) mentorship projects, reflecting Krakovna's role in the safety research community.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Corrigibility Failure | Risk | 62.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202612 KB
### Papers
[Evaluating Frontier Models for Stealth and Situational Awareness](https://arxiv.org/abs/2505.01420). Mary Phuong, Roland Zimmermann, Ziyue Wang, David Lindner, Victoria Krakovna, Sarah Cogan, Allan Dafoe, Lewis Ho, Rohin Shah. Arxiv 2025. ( [blog post](https://deepmindsafetyresearch.medium.com/evaluating-and-monitoring-for-ai-scheming-d3448219a967))
[Evaluating Frontier Models for Dangerous Capabilities](https://arxiv.org/abs/2403.13793). Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Toby Shevlane, et al. Arxiv 2024.
[Limitations of Agents Simulated by Predictive Models](https://arxiv.org/abs/2402.05829). Raymond Douglas, Jacek Karwowski, Chan Bae, Andis Draguns, Victoria Krakovna (MATS project). Arxiv 2024.
[Quantifying stability of non-power-seeking in artificial agents.](https://arxiv.org/abs/2401.03529) Evan Ryan Gunter, Yevgeny Liokumovich, Victoria Krakovna (MATS project). Arxiv 2024.
[Power-seeking can be probable and predictive for trained agents](https://arxiv.org/abs/2304.06528). Victoria Krakovna and Janos Kramar. Arxiv, 2023. ( [blog post](https://www.alignmentforum.org/posts/fLpuusx9wQyyEBtkJ/power-seeking-can-be-probable-and-predictive-for-trained))
[Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals](https://arxiv.org/abs/2210.01790). Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton. Arxiv, 2022.
[Avoiding Side Effects By Considering Future Tasks](https://papers.nips.cc/paper/2020/file/dc1913d422398c25c5f0b81cab94cc87-Paper.pdf). Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg. Neural Information Processing Systems, 2020. ( [arXiv](https://arxiv.org/abs/2010.07877), [code](https://github.com/deepmind/deepmind-research/tree/master/side_effects_penalties), [AN summary](https://mailchi.mp/051273eb96eb/an-122arguing-for-agi-driven-existential-risk-from-first-principles))
[Avoiding Tampering Incentives in Deep RL via Decoupled Approval](https://arxiv.org/abs/2011.08827). Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg. ArXiv, 2020. ( [blog post](https://medium.com/@deepmindsafetyresearch/realab-conceptualising-the-tampering-problem-56caab69b6d3), [AN summary](https://mailchi.mp/77121cab0cff/an-126-avoiding-wireheading-by-decoupling-action-feedback-from-action-effects))
[REALab: An Embedded Perspective on Tampering](https://arxiv.org/abs/2011.08820). Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg. ArXiv, 2020. ( [blog post](https://medium.com/@deepmindsafetyresearch/realab-conceptualising-the-tampering-problem-56caab69b6d3))
[Modeling AGI Safety Frameworks with Causal Influence Diagrams](https://arxiv.org/abs/1906.08663). Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg. IJCAI AI Safety workshop, 2019. ( [AN summary](https://mailchi.mp/2abdf1
... (truncated, 12 KB total)Resource ID:
45af23d90ccfc785 | Stable ID: ZWYzYWY0Yz