Shallow review of technical AI safety, 2024
webAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: LessWrong
An annually updated landscape review of technical AI safety research agendas from LessWrong; useful as a high-level orientation to the field rather than a deep technical treatment of any single agenda.
Forum Post Details
Metadata
Summary
A 2024 survey of active technical AI safety research agendas, updating the prior year's review. Authors spent approximately one hour per entry reviewing public information to help researchers orient themselves, inform policy discussions, and give funders visibility into funded work. The review notes significant capability advances in 2024 including long contexts, multimodality, reasoning, and agency improvements.
Key Points
- •Surveys active technical AI safety research areas using ~1 hour per entry, drawing only on public information for accessibility and breadth.
- •Defines AI safety as work preventing competent cognitive systems from having large unintended effects, framing the scope of included agendas.
- •Notes significant 2024 capability advances (long contexts, multimodality, reasoning, agency) despite plateauing pretraining scaling.
- •Explicitly cautions that research activity and actual progress are not equivalent, urging readers to interpret coverage critically.
- •Serves multiple audiences: researchers orienting themselves, policymakers, and funders seeking visibility into the landscape of funded work.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI-Assisted Alignment | Approach | 63.0 |
Cached Content Preview
x This website requires javascript to properly function. Consider activating javascript to get access to all site functionality. Shallow review of technical AI safety, 2024 — LessWrong Literature Reviews Research Agendas Academic Papers AI Alignment Fieldbuilding AI Alignment Intro Materials AI Frontpage 202
Shallow review of technical AI safety, 2024
by technicalities , Stag , Stephen McAleese , jordine , Dr. David Mathers 29th Dec 2024 AI Alignment Forum 49 min read 35 202
Ω 75
from aisafety.world
The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We also only use public information, so we are bound to be off by some additional factor.
The point is to help anyone look up some of what is happening, or that thing you vaguely remember reading about; to help new researchers orient and know (some of) their options and the standing critiques; to help policy people know who to talk to for the actual information; and ideally to help funders see quickly what has already been funded and how much (but this proves to be hard).
“AI safety” means many things. We’re targeting work that intends to prevent very competent cognitive systems from having large unintended effects on the world.
This time we also made an effort to identify work that doesn’t show up on LW/AF by trawling conferences and arXiv. Our list is not exhaustive, particularly for academic work which is unindexed and hard to discover. If we missed you or got something wrong, please comment, we will edit.
The method section is important but we put it down the bottom anyway.
Here’s a spreadsheet version also.
Editorial
One commenter said we shouldn’t do this review, because the sheer length of it fools people into thinking that there’s been lots of progress. Obviously we disagree that it’s not worth doing, but be warned: the following is an exercise in quantity ; activity is not the same as progress; you have to consider whether it actually helps.
A smell of ozone . In the last month there has been a flurry of hopeful or despairing pieces claiming that the next base models are not a big advance, or that we hit a data wall . These often ground out in gossip, but it’s true that the next-gen base models are held up by something, maybe just inference cost. The pretraining runs are bottlenecked on electrical power too. Amazon is at present not getting its nuclear datacentre .
But overall it’s been a big year for capabilities despite no pretraining scaling. I had forgotten long contexts were so recent; million-token windows only arrived in February. Multimodality was February. “Reasoning” was September. “Agency” was October .
I don’t trust benchmarks much, but on GPQA (hard science) they leapt all the way from chance to PhD-level just with post-training.
FrontierMath
... (truncated, 98 KB total)98cace7dc56632cc | Stable ID: MzNmYWViOD