Back
Turner has expressed reservations
webturntrout.com·turntrout.com/research
Alex Turner is a notable AI safety researcher whose work on instrumental convergence and power-seeking AI has been influential; this page serves as a hub for his published research and ongoing projects.
Metadata
Importance: 62/100homepage
Summary
This is the research homepage of Alex Turner (TurnTrout), an AI safety researcher known for work on instrumental convergence, power-seeking behavior, and corrigibility. The page likely catalogs his publications and research directions related to understanding and mitigating risks from misaligned AI systems.
Key Points
- •Turner has published influential work on instrumental convergence and why AI systems may develop self-preservation drives
- •Research focuses on power-seeking behavior in AI and formal frameworks for understanding corrigibility
- •Turner has expressed reservations about certain assumptions in mainstream AI alignment approaches
- •Work includes both theoretical analysis and practical implications for building safer AI systems
- •Associated with DeepMind and the broader AI safety research community
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Power-Seeking AI | Risk | 67.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 202642 KB
[Skip to main content](https://turntrout.com/research#center-content)
# My Research
By Alex Turner
Published on October 27th, 2024
> Table of Contents
>
> 1. [Low-impact AI](https://turntrout.com/research#low-impact-ai)
> 1. [Defining a new impact measure: aup](https://turntrout.com/research#defining-a-new-impact-measure-aup)
> 2. [Scaling the aup technique to harder tasks](https://turntrout.com/research#scaling-the-aup-technique-to-harder-tasks)
> 3. [Reflections on impact measures](https://turntrout.com/research#reflections-on-impact-measures)
> 2. [A formal theory of power-seeking tendencies](https://turntrout.com/research#a-formal-theory-of-power-seeking-tendencies)
> 1. [Reflections on the power-seeking theory](https://turntrout.com/research#reflections-on-the-power-seeking-theory)
> 3. [Shard theory](https://turntrout.com/research#shard-theory)
> 1. [Looking back on shard theory](https://turntrout.com/research#looking-back-on-shard-theory)
> 4. [Mechanistic interpretability](https://turntrout.com/research#mechanistic-interpretability)
> 5. [Steering vectors](https://turntrout.com/research#steering-vectors)
> 1. [Reflections on steering vector work](https://turntrout.com/research#reflections-on-steering-vector-work)
> 6. [Consistency training (against internal model activations)](https://turntrout.com/research#consistency-training-against-internal-model-activations)
> 7. [Selected research I’ve mentored](https://turntrout.com/research#selected-research-i-ve-mentored)
> 1. [Unsupervised capability elicitation](https://turntrout.com/research#unsupervised-capability-elicitation)
> 2. [Gradient routing](https://turntrout.com/research#gradient-routing)
> 3. [Distillation robustifies unlearning](https://turntrout.com/research#distillation-robustifies-unlearning)
> 4. [Output supervision can obfuscate the CoT](https://turntrout.com/research#output-supervision-can-obfuscate-the-cot)
> 8. [Footnotes](https://turntrout.com/research#footnote-label)
Over the years, I’ve worked on lots of research problems. Every time, I felt invested in my work. The work felt beautiful. Even though many days have passed since I have daydreamed about instrumental convergence, I’m proud of what I’ve accomplished and discovered.
 While not _technically_ a part of my research, I’ve included a photo of myself anyways.
As of November 2023, I am a research scientist on Google DeepMind’s scalable alignment team in the Bay area.[1](https://turntrout.com/research#user-content-fn-disclaim) I lead [a mats mentorship team called “Team Shard.” If you want to break into the alignment field, consider applying to work with me.](https://turntrout.com/team-shard) My [Google Scholar is here.](https://scholar.google.com/citations?user=thAHiVcAAAAJ)
This page is chronological. For my most recent work, [navigate to the end of the page!](https://turntrout.com/research#footn
... (truncated, 42 KB total)Resource ID:
d773c5dd9ea6b3c3 | Stable ID: MWI1YTc4Mz