EA Forum - Ought's Theory of Change
blogAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: EA Forum
This post by Ought (now dissolved into Elicit) articulates the organization's strategic rationale for process-based supervision as a bridge between near-term AI utility and long-term alignment, relevant to debates about scalable oversight and reward hacking.
Forum Post Details
Metadata
Summary
Ought explains its strategic approach to AI safety and beneficial AI through process-based machine learning, where systems are trained to supervise reasoning steps rather than outcomes. This methodology underpins their tool Elicit, designed to augment human reasoning on complex problems. They argue this approach simultaneously provides near-term value and advances long-term alignment goals by reducing outcome gaming.
Key Points
- •Ought builds process-based ML systems that supervise intermediate reasoning steps rather than final outcomes, reducing misalignment risks.
- •Their flagship product Elicit is an AI research assistant aimed at scaling open-ended reasoning for complex tasks.
- •Process supervision is positioned as both practically useful short-term and a meaningful contribution to AI alignment long-term.
- •Target domains include AI governance, climate change, and economic development—areas where improved reasoning could have large positive impact.
- •The theory of change links commercial AI tooling directly to alignment research, framing them as complementary rather than competing goals.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Elicit (AI Research Tool) | Organization | 63.0 |
Cached Content Preview
Hide table of contents
# [Ought's theory ofchange](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change)
by [stuhlmueller](https://forum.effectivealtruism.org/users/stuhlmueller?from=post_header), [jungofthewon](https://forum.effectivealtruism.org/users/jungofthewon?from=post_header)
Apr 11 20223 min read4
# 43
[AI safety](https://forum.effectivealtruism.org/topics/ai-safety)[AI alignment](https://forum.effectivealtruism.org/topics/ai-alignment)[Ought](https://forum.effectivealtruism.org/topics/ought)[Theory of change](https://forum.effectivealtruism.org/topics/theory-of-change) [Frontpage](https://forum.effectivealtruism.org/about#Finding_content)
[Ought's theory of change](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#)
[In short](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#In_short)
[Our mission](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#Our_mission)
[The case for process-based ML systems](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#The_case_for_process_based_ML_systems)
[How we think about success](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#How_we_think_about_success)
[Progress in 2021](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#Progress_in_2021)
[Roadmap for 2022+](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#Roadmap_for_2022_)
[4 comments](https://forum.effectivealtruism.org/posts/raFAKyw7ofSo9mRQ3/ought-s-theory-of-change#comments)
[Ought](https://ought.org/) is an applied machine learning lab. In this post we summarize our work on [Elicit](https://elicit.org/) and why we think it's important.
We'd love to get feedback on how to make Elicit more useful to the EA community, and on our plans more generally.
This post is based on two recent LessWrong posts:
- [Supervise Process, not Outcomes](https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes)
- [Elicit: Language Models as Research Assistants](https://www.lesswrong.com/posts/s5jrfbsGLyEexh4GT/elicit-language-models-as-research-assistants)
## In short
Our mission is to automate and scale open-ended reasoning. To that end, we’re building Elicit, the AI research assistant.
Elicit's architecture is based on [supervising reasoning processes, not outcomes](https://www.lesswrong.com/posts/pYcFPMBtQveAjcSfH/supervise-process-not-outcomes). This is better for supporting open-ended reasoning in the short run and better for alignment in the long run.
[Over the last year](https://www.lesswrong.com/posts/s5jrfbsGLyEexh4GT/elicit-language-models-as-research-assistants#Progress_in_2021), we built Elicit to support broad reviews of empirical literature. The literature review workflow runs on general-purpose infrastructure for executing composit
... (truncated, 24 KB total)ce27f1ad238ab164 | Stable ID: ZjA3NjlmOT