Shallow review of technical AI safety, 2024

web

2024·LessWrong·lesswrong.com/posts/fAW6RXLKTLHC3WXkS/shallow-review-of-t...

Authors

technicalities·Stag·Stephen McAleese·jordinne·Dr. David Mathers

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: LessWrong

An annually updated landscape review of technical AI safety research agendas from LessWrong; useful as a high-level orientation to the field rather than a deep technical treatment of any single agenda.

Forum Post Details

Karma

202

Comments

Forum

lesswrong

Forum Tags

Literature ReviewsResearch AgendasAcademic PapersAI Alignment FieldbuildingAI Alignment Intro MaterialsAI

Metadata

Importance: 72/100blog postanalysis

Summary

A 2024 survey of active technical AI safety research agendas, updating the prior year's review. Authors spent approximately one hour per entry reviewing public information to help researchers orient themselves, inform policy discussions, and give funders visibility into funded work. The review notes significant capability advances in 2024 including long contexts, multimodality, reasoning, and agency improvements.

Key Points

•Surveys active technical AI safety research areas using ~1 hour per entry, drawing only on public information for accessibility and breadth.
•Defines AI safety as work preventing competent cognitive systems from having large unintended effects, framing the scope of included agendas.
•Notes significant 2024 capability advances (long contexts, multimodality, reasoning, agency) despite plateauing pretraining scaling.
•Explicitly cautions that research activity and actual progress are not equivalent, urging readers to interpret coverage critically.
•Serves multiple audiences: researchers orienting themselves, policymakers, and funders seeking visibility into the landscape of funded work.

Cited by 1 page

Page	Type	Quality
AI-Assisted Alignment	Approach	63.0

Cached Content Preview

HTTP 200Fetched Apr 10, 202698 KB

# Shallow review of technical AI safety, 2024
By technicalities, Stag, Stephen McAleese, jordinne, Dr. David Mathers
Published: 2024-12-29
![](https://39669.cdn.cke-cs.com/rQvD3VnunXZu34m86e5f/images/8cecb6e2fe70b114bf1bac649aa6a2e8334881d081387f54.png)

*from *[*aisafety.world*](https://www.aisafety.com/landscape-map)

The following is a list of live agendas in technical AI safety, updating [our post](https://www.lesswrong.com/posts/zaaGsFBeDTpCsYHef/shallow-review-of-live-agendas-in-alignment-and-safety) from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We also only use public information, so we are bound to be off by some additional factor. 

The point is to help anyone look up some of what is happening, or that thing you vaguely remember reading about; to help new researchers orient and know (some of) their options and the standing critiques; to help policy people know who to talk to for the actual information; and ideally to help funders see quickly what has already been funded and how much (but this proves to be hard).

“AI safety” means many things. We’re targeting work that intends to prevent very competent cognitive systems from having large unintended effects on the world. 

This time we also made an effort to identify work that doesn’t show up on LW/AF by trawling conferences and arXiv. Our list is not exhaustive, particularly for academic work which is unindexed and hard to discover. **If we missed you or got something wrong, please comment, we will edit.**

The method section is important but we put it down the bottom anyway.

[Here’s a spreadsheet version also.](https://docs.google.com/spreadsheets/d/1ZqRlLAq5K-QLu1uvU4UAy_BYGbDTbCr0KQ0Gi_fvZy4/edit?usp=sharing)

Editorial
=========

*   One commenter said we shouldn’t do this review, because the sheer length of it fools people into thinking that there’s been lots of progress. Obviously we disagree that it’s not worth doing, but be warned: the following is an exercise in *quantity*; activity is not the same as progress; you have to consider whether it actually helps.  
     
*   *A smell of ozone*. In the last month there has been a [flurry](https://www.understandingai.org/p/its-the-end-of-pretraining-as-we?publication_id=1501429&post_id=153369467&isFreemail=true&token=eyJ1c2VyX2lkIjo3MzU3MTA3NSwicG9zdF9pZCI6MTUzMzY5NDY3LCJpYXQiOjE3MzQ2MzAzNTIsImV4cCI6MTczNzIyMjM1MiwiaXNzIjoicHViLTE1MDE0MjkiLCJzdWIiOiJwb3N0LXJlYWN0aW9uIn0.1oYiJGa_WnJAJ8X7F08qOSbHS00FQ11DjZfHmXbUohc&r=17svsz&triedRedirect=true&utm_source=substack&utm_medium=email) of hopeful or despairing [pieces](https://archive.is/Mxc7z#selection-1805.184-1809.73) claiming that the next base models are not a big advance, or that we hit a [data wall](https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training). These often ground out in gossip, but it’s true that the next-gen base models are held 

... (truncated, 98 KB total)

Resource ID: 98cace7dc56632cc | Stable ID: sid_cMRsnGFGws