RL agents
paperAuthors
Credibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: arXiv
Investigates how RL agents can learn causal reasoning from passive data, addressing safety concerns about agent learning and generalization in interactive domains like tool use.
Paper Details
Metadata
Abstract
What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle. We then show empirically that agents trained via imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. We then show that strategies for causal intervention and exploitation can be generalized from passive data even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, we show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models.
Summary
This paper investigates how agents can learn causal reasoning and experimentation strategies from purely passive data, despite the inherent limitations of passive learning. The authors demonstrate both theoretically and empirically that agents trained via imitation on passive expert data can generalize at test time to infer causal relationships and devise experimentation strategies for novel scenarios never seen during training. Notably, they show that language models trained only on next-word prediction can acquire causal intervention strategies through few-shot prompting with explanations, suggesting that passive learning contains sufficient information for active causal reasoning when combined with appropriate inference mechanisms.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Deceptive Alignment Decomposition Model | Analysis | 62.0 |
Cached Content Preview
# Passive learning of active causal strategies in agents and language models
Andrew K. Lampinen
Google DeepMind
London, UK
lampinen@deepmind.com
&Stephanie C. Y. Chan
Google DeepMind
London, UK
scychan@deepmind.com
&Ishita Dasgupta
Google DeepMind
London, UK
idg@deepmind.com
Andrew J. Nam
Stanford University
Stanford, CA
ajhnam@stanford.edu
&Jane X. Wang
Google DeepMind
London, UK
wangjane@deepmind.com
###### Abstract
What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that, under certain assumptions, learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle. We then show empirically that agents trained via imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training.
We then show that strategies for causal intervention and exploitation can be generalized from passive data even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from otherwise perfectly-confounded training data. Finally, we show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models.
## 1 Introduction
Learning from passive observational data only allows learning correlational, not causal, structure. This observation is sometimes cited as a fundamental limitation of current machine learning research \[ [60](https://ar5iv.labs.arxiv.org/html/2305.16183#bib.bib60 ""), [61](https://ar5iv.labs.arxiv.org/html/2305.16183#bib.bib61 ""), [39](https://ar5iv.labs.arxiv.org/html/2305.16183#bib.bib39 "")\]. However, reinforcement learning (RL) agents can intervene on their environment, and are therefore not entirely limited. Indeed, various works have shown that RL agents can (meta-)learn to intervene on the environment to discover and exploit its causal structure \[ [50](https://ar5iv.labs.arxiv.org/html/2305.16183#bib.bib50 ""), [14](https://ar5iv.labs.arxiv.org/html/2305.16183#bib.bib14 ""), [41](https://ar5iv.labs.arxiv.org/html/2305.16183#bib.
... (truncated, 98 KB total)bf34410b4b3a23c6 | Stable ID: MDcyMzBhZj