Skip to content
Longterm Wiki
Back

Author

Marius Hobbhahn

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Written by Marius Hobbhahn of Apollo Research in January 2025, this post synthesizes near-term AI safety priorities under short-timeline assumptions and is notable for prompting community discussion on the absence of detailed public safety roadmaps.

Metadata

Importance: 72/100blog postanalysis

Summary

Marius Hobbhahn outlines a two-layer safety plan for scenarios where transformative AI arrives soon, arguing that current publicly available strategies are insufficiently detailed. Layer 1 focuses on near-term controls like CoT monitoring, AI control, and evals; Layer 2 addresses deeper alignment research including interpretability and scalable oversight.

Key Points

  • Short AI timelines are treated as plausible, requiring concrete safety plans that go beyond vague aspirations.
  • Layer 1 priorities: maintaining human-legible chain-of-thought, improved monitoring, AI control methods, scheming detection, robust evals, and security.
  • Layer 2 priorities: improved near-term alignment strategies, interpretability, scalable oversight, reasoning transparency, and safety-first organizational culture.
  • The post expresses concern that detailed, publicly available short-timeline safety plans are largely absent from the AI safety community.
  • The author acknowledges known limitations and open questions, framing the post as a prompt for community discussion rather than a finished blueprint.

Cited by 1 page

PageTypeQuality
Short AI Timeline Policy ImplicationsAnalysis62.0

Cached Content Preview

HTTP 200Fetched Mar 15, 202669 KB
[What’s the short timeline plan?](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#)

28 min read

•

[Short timelines are plausible](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Short_timelines_are_plausible)

•

[What do we need to achieve at a minimum?](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#What_do_we_need_to_achieve_at_a_minimum_)

•

[Making conservative assumptions for safety progress](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Making_conservative_assumptions_for_safety_progress)

•

[So what's the plan?](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#So_what_s_the_plan_)

•

[Layer 1](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Layer_1)

•

[Keep a paradigm with faithful and human-legible CoT](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Keep_a_paradigm_with_faithful_and_human_legible_CoT)

•

[Significantly better (CoT, action & white-box) monitoring](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Significantly_better__CoT__action___white_box__monitoring)

•

[Control (that doesn’t assume human-legible CoT)](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Control__that_doesn_t_assume_human_legible_CoT_)

•

[Much deeper understanding of scheming](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Much_deeper_understanding_of_scheming)

•

[Evals](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Evals)

•

[Security](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Security)

•

[Layer 2](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Layer_2)

•

[Improved near-term alignment strategies](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Improved_near_term_alignment_strategies)

•

[Continued work on interpretability, scalable oversight, superalignment & co](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Continued_work_on_interpretability__scalable_oversight__superalignment___co)

•

[Reasoning transparency](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Reasoning_transparency)

•

[Safety first culture](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Safety_first_culture)

•

[Known limitations and open questions](https://www.alignmentforum.org/posts/bb5Tnjdrptu89rcyY/what-s-the-short-timeline-plan#Known_limitations_and_open_questions)

[AI Control](https://www.alignmentforum.org/w/ai-control)[AI Evaluations](https://www.alignmentforum.org/w/ai-evaluations)[Deceptive Alignment](https

... (truncated, 69 KB total)
Resource ID: 145e6d684253d6f0 | Stable ID: OGZiYjFhYT