Skip to content
Longterm Wiki
Back

Theory of Mind May Have Spontaneously Emerged in Large Language Models (Kosinski, 2023)

web

A widely cited but disputed 2023 paper claiming emergent theory of mind in GPT-4; important for discussions of unpredictable capability emergence and the difficulty of evaluating whether AI systems model human mental states, with direct implications for deception and manipulation risks.

Metadata

Importance: 62/100working paperprimary source

Summary

Michal Kosinski's influential and controversial study argues that large language models, particularly GPT-4, spontaneously developed theory of mind (ToM) capabilities—the ability to attribute mental states to others—as an emergent property of scale. The paper presents benchmark results suggesting GPT-4 performs at or near human adult levels on classic false-belief tasks. This sparked significant debate about whether LLMs genuinely reason about mental states or exploit statistical patterns.

Key Points

  • GPT-4 reportedly solved 95% of ToM tasks, comparable to 9-year-old human performance, despite not being explicitly trained for this capability.
  • Theory of mind appears to have emerged spontaneously as model scale increased, suggesting capabilities can arise unpredictably from scaling alone.
  • The findings are highly contested—critics argue LLMs may exploit training data contamination or surface-level patterns rather than genuine mental state reasoning.
  • Raises safety-relevant questions about whether advanced AI systems could model human intentions, beliefs, and deception without explicit design.
  • Contributes to broader debates on emergent capabilities, benchmark validity, and how to evaluate whether LLMs truly understand versus pattern-match.

Cited by 1 page

PageTypeQuality
Emergent CapabilitiesRisk61.0

Cached Content Preview

HTTP 200Fetched Mar 20, 202613 KB
[![Stanford Graduate School of Business](https://www.gsb.stanford.edu/themes/custom/gsb/logo.svg)](https://www.gsb.stanford.edu/ "Stanford Graduate School of Business")

[Close menu](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models#mm-0) [Faculty & Research](https://www.gsb.stanford.edu/faculty-research)

[Back \\
\\
Close submenu](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models#mm-0)

Menu

- [Faculty](https://www.gsb.stanford.edu/faculty-research/faculty)
- [Publications](https://www.gsb.stanford.edu/faculty-research/publications)
- [Books](https://www.gsb.stanford.edu/faculty-research/books)
- [Working Papers](https://www.gsb.stanford.edu/faculty-research/working-papers)
- [Case Studies](https://www.gsb.stanford.edu/faculty-research/case-studies)
- [Postdoctoral Scholars](https://www.gsb.stanford.edu/faculty-research/postdoctoral-scholars)
- [Research Labs & Initiatives](https://www.gsb.stanford.edu/faculty-research/labs-initiatives)
- [Behavioral Lab](https://www.gsb.stanford.edu/faculty-research/behavioral-lab)
- [Data, Analytics & Research Computing](https://www.gsb.stanford.edu/faculty-research/darc)

[Faculty & Research](https://www.gsb.stanford.edu/faculty-research)

[Menu](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models#horizontal-menu)

- [Faculty](https://www.gsb.stanford.edu/faculty-research/faculty)
- [Publications](https://www.gsb.stanford.edu/faculty-research/publications)
- [Books](https://www.gsb.stanford.edu/faculty-research/books)
- [Working Papers](https://www.gsb.stanford.edu/faculty-research/working-papers)
- [Case Studies](https://www.gsb.stanford.edu/faculty-research/case-studies)
- [Research Labs & Initiatives](https://www.gsb.stanford.edu/faculty-research/labs-initiatives)
- [Behavioral Lab](https://www.gsb.stanford.edu/faculty-research/behavioral-lab)
- [DARC](https://www.gsb.stanford.edu/faculty-research/darc)

# Theory of Mind May Have Spontaneously Emerged in Large Language Models

By [Michal Kosinski](https://www.gsb.stanford.edu/faculty-research/faculty/michal-kosinski)

March2023

[Organizational Behavior](https://www.gsb.stanford.edu/faculty-research/working-papers?academic-area[10026]=10026)

[View Publicationopen in new window](https://arxiv.org/abs/2302.02083)

Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We tested several language models using 40 classic false-belief tasks widely used to test ToM in humans. The models published before 2020 showed virtually no ability to solve ToM tasks. Yet, the first version of GPT-3 (“davinci-001”), published in May 2020, solved about 40% of false-belief tasks — performance comparable with 3.5-year-old child

... (truncated, 13 KB total)
Resource ID: d5b875308e858c3f | Stable ID: Yzc5NzkxNW