Theory of Mind May Have Spontaneously Emerged in Large Language Models (Kosinski, 2023)
webA widely cited but disputed 2023 paper claiming emergent theory of mind in GPT-4; important for discussions of unpredictable capability emergence and the difficulty of evaluating whether AI systems model human mental states, with direct implications for deception and manipulation risks.
Metadata
Summary
Michal Kosinski's influential and controversial study argues that large language models, particularly GPT-4, spontaneously developed theory of mind (ToM) capabilities—the ability to attribute mental states to others—as an emergent property of scale. The paper presents benchmark results suggesting GPT-4 performs at or near human adult levels on classic false-belief tasks. This sparked significant debate about whether LLMs genuinely reason about mental states or exploit statistical patterns.
Key Points
- •GPT-4 reportedly solved 95% of ToM tasks, comparable to 9-year-old human performance, despite not being explicitly trained for this capability.
- •Theory of mind appears to have emerged spontaneously as model scale increased, suggesting capabilities can arise unpredictably from scaling alone.
- •The findings are highly contested—critics argue LLMs may exploit training data contamination or surface-level patterns rather than genuine mental state reasoning.
- •Raises safety-relevant questions about whether advanced AI systems could model human intentions, beliefs, and deception without explicit design.
- •Contributes to broader debates on emergent capabilities, benchmark validity, and how to evaluate whether LLMs truly understand versus pattern-match.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Emergent Capabilities | Risk | 61.0 |
Cached Content Preview
[](https://www.gsb.stanford.edu/ "Stanford Graduate School of Business")
[Close menu](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models#mm-0) [Faculty & Research](https://www.gsb.stanford.edu/faculty-research)
[Back \\
\\
Close submenu](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models#mm-0)
Menu
- [Faculty](https://www.gsb.stanford.edu/faculty-research/faculty)
- [Publications](https://www.gsb.stanford.edu/faculty-research/publications)
- [Books](https://www.gsb.stanford.edu/faculty-research/books)
- [Working Papers](https://www.gsb.stanford.edu/faculty-research/working-papers)
- [Case Studies](https://www.gsb.stanford.edu/faculty-research/case-studies)
- [Postdoctoral Scholars](https://www.gsb.stanford.edu/faculty-research/postdoctoral-scholars)
- [Research Labs & Initiatives](https://www.gsb.stanford.edu/faculty-research/labs-initiatives)
- [Behavioral Lab](https://www.gsb.stanford.edu/faculty-research/behavioral-lab)
- [Data, Analytics & Research Computing](https://www.gsb.stanford.edu/faculty-research/darc)
[Faculty & Research](https://www.gsb.stanford.edu/faculty-research)
[Menu](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models#horizontal-menu)
- [Faculty](https://www.gsb.stanford.edu/faculty-research/faculty)
- [Publications](https://www.gsb.stanford.edu/faculty-research/publications)
- [Books](https://www.gsb.stanford.edu/faculty-research/books)
- [Working Papers](https://www.gsb.stanford.edu/faculty-research/working-papers)
- [Case Studies](https://www.gsb.stanford.edu/faculty-research/case-studies)
- [Research Labs & Initiatives](https://www.gsb.stanford.edu/faculty-research/labs-initiatives)
- [Behavioral Lab](https://www.gsb.stanford.edu/faculty-research/behavioral-lab)
- [DARC](https://www.gsb.stanford.edu/faculty-research/darc)
# Theory of Mind May Have Spontaneously Emerged in Large Language Models
By [Michal Kosinski](https://www.gsb.stanford.edu/faculty-research/faculty/michal-kosinski)
March2023
[Organizational Behavior](https://www.gsb.stanford.edu/faculty-research/working-papers?academic-area[10026]=10026)
[View Publicationopen in new window](https://arxiv.org/abs/2302.02083)
Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We tested several language models using 40 classic false-belief tasks widely used to test ToM in humans. The models published before 2020 showed virtually no ability to solve ToM tasks. Yet, the first version of GPT-3 (“davinci-001”), published in May 2020, solved about 40% of false-belief tasks — performance comparable with 3.5-year-old child
... (truncated, 13 KB total)d5b875308e858c3f | Stable ID: Yzc5NzkxNW