Skip to content
Longterm Wiki
Back

Turner has expressed reservations

web
turntrout.com·turntrout.com/research

Alex Turner is a notable AI safety researcher whose work on instrumental convergence and power-seeking AI has been influential; this page serves as a hub for his published research and ongoing projects.

Metadata

Importance: 62/100homepage

Summary

This is the research homepage of Alex Turner (TurnTrout), an AI safety researcher known for work on instrumental convergence, power-seeking behavior, and corrigibility. The page likely catalogs his publications and research directions related to understanding and mitigating risks from misaligned AI systems.

Key Points

  • Turner has published influential work on instrumental convergence and why AI systems may develop self-preservation drives
  • Research focuses on power-seeking behavior in AI and formal frameworks for understanding corrigibility
  • Turner has expressed reservations about certain assumptions in mainstream AI alignment approaches
  • Work includes both theoretical analysis and practical implications for building safer AI systems
  • Associated with DeepMind and the broader AI safety research community

Cited by 1 page

PageTypeQuality
Power-Seeking AIRisk67.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202642 KB
[Skip to main content](https://turntrout.com/research#center-content)

# My Research

By Alex Turner

Published on October 27th, 2024

> Table of Contents
>
> 1. [Low-impact AI](https://turntrout.com/research#low-impact-ai)
> 1. [Defining a new impact measure: aup](https://turntrout.com/research#defining-a-new-impact-measure-aup)
> 2. [Scaling the aup technique to harder tasks](https://turntrout.com/research#scaling-the-aup-technique-to-harder-tasks)
> 3. [Reflections on impact measures](https://turntrout.com/research#reflections-on-impact-measures)
> 2. [A formal theory of power-seeking tendencies](https://turntrout.com/research#a-formal-theory-of-power-seeking-tendencies)
> 1. [Reflections on the power-seeking theory](https://turntrout.com/research#reflections-on-the-power-seeking-theory)
> 3. [Shard theory](https://turntrout.com/research#shard-theory)
> 1. [Looking back on shard theory](https://turntrout.com/research#looking-back-on-shard-theory)
> 4. [Mechanistic interpretability](https://turntrout.com/research#mechanistic-interpretability)
> 5. [Steering vectors](https://turntrout.com/research#steering-vectors)
> 1. [Reflections on steering vector work](https://turntrout.com/research#reflections-on-steering-vector-work)
> 6. [Consistency training (against internal model activations)](https://turntrout.com/research#consistency-training-against-internal-model-activations)
> 7. [Selected research I’ve mentored](https://turntrout.com/research#selected-research-i-ve-mentored)
> 1. [Unsupervised capability elicitation](https://turntrout.com/research#unsupervised-capability-elicitation)
> 2. [Gradient routing](https://turntrout.com/research#gradient-routing)
> 3. [Distillation robustifies unlearning](https://turntrout.com/research#distillation-robustifies-unlearning)
> 4. [Output supervision can obfuscate the CoT](https://turntrout.com/research#output-supervision-can-obfuscate-the-cot)
> 8. [Footnotes](https://turntrout.com/research#footnote-label)

Over the years, I’ve worked on lots of research problems. Every time, I felt invested in my work. The work felt beautiful. Even though many days have passed since I have daydreamed about instrumental convergence, I’m proud of what I’ve accomplished and discovered.

![A professional photograph of me.](https://assets.turntrout.com/Attachments/Pasted%20image%2020240614164142.avif) While not _technically_ a part of my research, I’ve included a photo of myself anyways.

As of November 2023, I am a research scientist on Google DeepMind’s scalable alignment team in the Bay area.[1](https://turntrout.com/research#user-content-fn-disclaim) I lead [a mats mentorship team called “Team Shard.” If you want to break into the alignment field, consider applying to work with me.](https://turntrout.com/team-shard) My [Google Scholar is here.](https://scholar.google.com/citations?user=thAHiVcAAAAJ)

This page is chronological. For my most recent work, [navigate to the end of the page!](https://turntrout.com/research#footn

... (truncated, 42 KB total)
Resource ID: d773c5dd9ea6b3c3 | Stable ID: MWI1YTc4Mz