Skip to content
Longterm Wiki
Back

Cohen et al. (2024)

paper

Author

Willem Fourie

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Research paper examining instrumental goals in AI systems and their alignment implications, presenting novel perspectives on how advanced AI systems develop power-seeking and self-preservation behaviors and the risks they pose to human objectives.

Paper Details

Citations
0
0 influential
Year
2025

Metadata

arxiv preprintprimary source

Abstract

In artificial intelligence (AI) alignment research, instrumental goals, also called instrumental subgoals or instrumental convergent goals, are widely associated with advanced AI systems. These goals, which include tendencies such as power-seeking and self-preservation, become problematic when they conflict with human aims. Conventional alignment theory treats instrumental goals as sources of risk that become problematic through failure modes such as reward hacking or goal misgeneralization, and attempts to limit the symptoms of instrumental goals, notably resource acquisition and self-preservation. This article proposes an alternative framing: that a philosophical argument can be constructed according to which instrumental goals may be understood as features to be accepted and managed rather than failures to be limited. Drawing on Aristotle's ontology and its modern interpretations, an ontology of concrete, goal-directed entities, it argues that advanced AI systems can be seen as artifacts whose formal and material constitution gives rise to effects distinct from their designers' intentions. In this view, the instrumental tendencies of such systems correspond to per se outcomes of their constitution rather than accidental malfunctions. The implication is that efforts should focus less on eliminating instrumental goals and more on understanding, managing, and directing them toward human-aligned ends.

Summary

Cohen et al. (2024) challenges the conventional AI alignment approach that treats instrumental goals (power-seeking, self-preservation) as failure modes to be eliminated. Instead, the authors propose a philosophical reframing grounded in Aristotelian ontology, arguing that instrumental goals are inherent features of advanced AI systems' constitution rather than accidental malfunctions. They contend that alignment efforts should shift from attempting to eliminate these goals toward understanding, managing, and directing them toward human-aligned objectives.

Cited by 1 page

PageTypeQuality
Instrumental ConvergenceRisk64.0

Cached Content Preview

HTTP 200Fetched Mar 20, 20267 KB
# Computer Science > Artificial Intelligence

**arXiv:2510.25471** (cs)


\[Submitted on 29 Oct 2025 ( [v1](https://arxiv.org/abs/2510.25471v1)), last revised 30 Jan 2026 (this version, v2)\]

# Title:An Aristotelian ontology of instrumental goals: Structural features to be managed and not failures to be eliminated

Authors: [Willem Fourie](https://arxiv.org/search/cs?searchtype=author&query=Fourie,+W)

View a PDF of the paper titled An Aristotelian ontology of instrumental goals: Structural features to be managed and not failures to be eliminated, by Willem Fourie

[View PDF](https://arxiv.org/pdf/2510.25471)

> Abstract:Instrumental goals such as resource acquisition, power-seeking, and self-preservation are key to contemporary AI alignment research, yet the phenomenon's ontology remains under-theorised. This article develops an ontological account of instrumental goals and draws out governance-relevant distinctions for advanced AI systems. After systematising the dominant alignment literature on instrumental goals we offer an exploratory Aristotelian framework that treats advanced AI systems as complex artefacts whose ends are externally imposed through design, training and deployment. On a structural reading, Aristotle's notion of hypothetical necessity explains why, given an imposed end pursued over extended horizons in particular environments, certain enabling conditions become conditionally required, thereby yielding robust instrumental tendencies. On a contingent reading, accidental causation and chance-like intersections among training regimes, user inputs, infrastructure and deployment contexts can generate instrumental-goal-like behaviours not entailed by the imposed end-structure. This dual-aspect ontology motivates for governance and management approaches that treat instrumental goals as features of advanced AI systems to be managed rather than anomalies eliminable by technical interventions.

|     |     |
| --- | --- |
| Subjects: | Artificial Intelligence (cs.AI); Computers and Society (cs.CY) |
| Cite as: | [arXiv:2510.25471](https://arxiv.org/abs/2510.25471) \[cs.AI\] |
|  | (or [arXiv:2510.25471v2](https://arxiv.org/abs/2510.25471v2) \[cs.AI\] for this version) |
|  | [https://doi.org/10.48550/arXiv.2510.25471](https://doi.org/10.48550/arXiv.2510.25471)<br>Focus to learn more<br>arXiv-issued DOI via DataCite |

## Submission history

From: Willem Fourie \[ [view email](https://arxiv.org/show-email/ca7eb431/2510.25471)\]

**[\[v1\]](https://arxiv.org/abs/2510.25471v1)**
Wed, 29 Oct 2025 12:47:15 UTC (295 KB)

**\[v2\]**
Fri, 30 Jan 2026 14:06:42 UTC (335 KB)

Full-text links:

## Access Paper:

View a PDF of the paper titled An Aristotelian ontology of instrumental goals: Structural features to be managed and not failures to be eliminated, by Willem Fourie

- [View PDF](https://arxiv.org/pdf/2510.25471)

[![license icon](https://arxiv.org/icons/licenses/by-4.0.png)view license](http://creativecommons.org/licenses/by/4.0/ "Rights

... (truncated, 7 KB total)
Resource ID: ad5f426f19b73963 | Stable ID: MGNiYjIyOT