Jacob Steinhardt

web

jsteinhardt.stat.berkeley.edu·jsteinhardt.stat.berkeley.edu

Metadata

1 FactBase fact citing this source

Entity	Property	Value	As Of
Jacob Steinhardt	Notable For	UC Berkeley professor working on AI safety and robustness; leads the Steinhardt Group; runs AI forecasting contests	—

Cached Content Preview

HTTP 200Fetched Apr 30, 20266 KB

![Jacob Steinhardt](https://jsteinhardt.stat.berkeley.edu/images/profile.png)

## Jacob Steinhardt

Associate Professor

Department of Statistics

UC Berkeley

About

- [Email](mailto:jsteinhardt@berkeley.edu)
- [Google Scholar](https://scholar.google.com/citations?user=LKv32bgAAAAJ&hl=en)
- [Undergraduate Application](https://docs.google.com/forms/d/e/1FAIpQLSf7Tg1XqkQzMuPE4FezeemDhR2EWnN0ayeZWyKM374PKhjmFA/viewform)

I am an Associate Professor of Statistics and EECS at UC Berkeley, where I’m also part of BAIR and CLIMB. I am also Founder & CEO of [Transluce](https://jsteinhardt.stat.berkeley.edu/transluce.org), a non-profit research lab building open, scalable technology for understanding frontier AI systems.

My research focuses on ensuring machine learning systems are understood by and aligned with humans. The basic problem is that ML models are complex systems that often produce unintended consequences. For instance, ML systems tend to exploit errors in the reward function, leading to unintended behavior that [often gets worse as models get bigger](https://arxiv.org/abs/2201.03544). The problem compounds once ML systems interact with each other or with humans, which can lead to [strategic incentives](https://arxiv.org/abs/2306.14670) and [other intrasystem goals](https://arxiv.org/abs/2402.06627).

To tackle this problem, one approach is to understand not just the outputs of neural networks but also their latent activations, which represent the computational process used to generate outputs. By [understanding this process](https://arxiv.org/abs/2406.19501), we can hopefully [modify it](https://arxiv.org/abs/2406.04341) to be more aligned with human intent.

Another approach is to enable humans to better understand complex systems. We have built [several](https://arxiv.org/abs/2409.08466) [systems](https://arxiv.org/abs/2312.02974) that consume large datasets and summarize their properties in natural language. More generally, ML models could help humans with important but difficult tasks such as understanding the long-term consequences of an action, [automatically discovering failures](https://arxiv.org/abs/2306.12105) in an ML or computer system, or [predicting future world events](https://arxiv.org/abs/2402.18563).

I seek students who are technically strong, broad-minded, and want to improve the world through their research. I particularly value creative thinkers and curious empiricists who are excited to chart new approaches to the field.

As a graduate student, I was very fortunate to be advised by [Percy Liang](https://cs.stanford.edu/~pliang/). During my post-doc year, I worked at OpenAI and Open Philanthropy. I like ultimate frisbee, power lifting, and indoor bouldering.

## Current Ph.D. students and post-docs

- [Ruiqi Zhong](https://ruiqi-zhong.github.io/) (co-advised with [Dan Klein](http://people.eecs.berkeley.edu/~klein/))
- [Meena Jagadeesan](https://mjagadeesan.github.io/) (co-advised with [Mike Jordan](https://people.eecs.be

... (truncated, 6 KB total)

Resource ID: 5db98ad6d6ca53d2 | Stable ID: sid_WjMo7EZiWQ