Skip to content
Longterm Wiki
Back

A Path Towards Autonomous Machine Intelligence

web

Influential position paper by Meta's chief AI scientist outlining an alternative to transformer-based LLMs; relevant to AI safety discussions about world models, agency, and whether current paradigms can lead to safe autonomous systems.

Metadata

Importance: 62/100conference paperprimary source

Summary

Yann LeCun's position paper proposes a modular cognitive architecture for autonomous AI systems that learn world models and reason about actions hierarchically, contrasting with purely data-driven approaches like large language models. The paper introduces concepts like Joint Embedding Predictive Architecture (JEPA) and energy-based models as foundations for human-level AI. It argues that current deep learning paradigms are insufficient for general intelligence and proposes intrinsic motivation and hierarchical planning as key missing components.

Key Points

  • Proposes a modular cognitive architecture with world models, perception, memory, actor, and cost modules as building blocks for autonomous AI
  • Introduces Joint Embedding Predictive Architecture (JEPA) as a self-supervised learning framework that avoids generative pixel-level prediction
  • Argues that current LLMs and generative models lack the grounded world models needed for robust reasoning and planning
  • Hierarchical planning and intrinsic motivation (curiosity, discomfort avoidance) are identified as essential for human-like autonomous agents
  • Frames AI safety concerns within the architecture by embedding configurable objectives and guardrails into the cost/critic module

Cited by 2 pages

Cached Content Preview

HTTP 200Fetched Mar 20, 202698 KB
# A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27

Yann LeCun Courant Institute of Mathematical Sciences, New York University [yann@cs.nyu.edu](mailto:yann@cs.nyu.edu) Meta - Fundamental AI Research [yann@fb.com](mailto:yann@fb.com)

June 27, 2022

# Abstract

How could machines learn as efficiently as humans and animals? How could machines learn to reason and plan? How could machines learn representations of percepts and action plans at multiple levels of abstraction, enabling them to reason, predict, and plan at multiple time horizons? This position paper proposes an architecture and training paradigms with which to construct autonomous intelligent agents. It combines concepts such as configurable predictive world model, behavior driven through intrinsic motivation, and hierarchical joint embedding architectures trained with self-supervised learning.

Keywords: Artificial Intelligence, Machine Common Sense, Cognitive Architecture, Deep Learning, Self-Supervised Learning, Energy-Based Model, World Models, Joint Embedding Architecture, Intrinsic Motivation.

# 1 Prologue

This document is not a technical nor scholarly paper in the traditional sense, but a position paper expressing my vision for a path towards intelligent machines that learn more like animals and humans, that can reason and plan, and whose behavior is driven by intrinsic objectives, rather than by hard-wired programs, external supervision, or external rewards. Many ideas described in this paper (almost all of them) have been formulated by many authors in various contexts in various form. The present piece does not claim priority for any of them but presents a proposal for how to assemble them into a consistent whole. In particular, the piece pinpoints the challenges ahead. It also lists a number of avenues that are likely or unlikely to succeed.

The text is written with as little jargon as possible, and using as little mathematical prior knowledge as possible, so as to appeal to readers with a wide variety of backgrounds including neuroscience, cognitive science, and philosophy, in addition to machine learning, robotics, and other fields of engineering. I hope that this piece will help contextualize some of the research in AI whose relevance is sometimes difficult to see.

# 2 Introduction

Animals and humans exhibit learning abilities and understandings of the world that are far beyond the capabilities of current AI and machine learning (ML) systems.

How is it possible for an adolescent to learn to drive a car in about 20 hours of practice and for children to learn language with what amounts to a small exposure. How is it that most humans will know how to act in many situation they have never encountered? By contrast, to be reliable, current ML systems need to be trained with very large numbers of trials so that even the rarest combination of situations will be encountered frequently during training. Still, our best ML systems are still very far from ma

... (truncated, 98 KB total)
Resource ID: 9223b72aaa7340d7 | Stable ID: 3cH1vT52ow