Back
Deepfakes research by the University of Washington
webgrail.cs.washington.edu·grail.cs.washington.edu/projects/AudioToObama/
This early deepfake research paper is foundational for understanding AI-generated synthetic media risks; it predates widespread deepfake awareness and illustrates how academic capability research can have significant dual-use implications for information integrity and democratic processes.
Metadata
Importance: 72/100tool pageprimary source
Summary
This SIGGRAPH 2017 paper from the University of Washington demonstrates a technique for synthesizing photorealistic video of a person speaking by mapping audio features to mouth shapes using a recurrent neural network, trained on hours of Obama's weekly address footage. The system produces high-quality lip-synced video composited with accurate 3D pose matching, representing an early landmark in what became known as deepfake technology.
Key Points
- •Uses an RNN to learn mappings from raw audio features to mouth shapes, enabling realistic lip-sync video synthesis of a specific individual.
- •Trained on many hours of publicly available video footage, demonstrating that large amounts of single-subject data can enable convincing face manipulation.
- •Produces photorealistic results by synthesizing mouth texture and compositing with 3D pose matching into target video clips.
- •Represents a foundational academic contribution to deepfake/synthetic media technology with significant dual-use implications for disinformation.
- •Compared favorably to contemporaneous face-reenactment methods like Face2Face (Thies et al. 2016), establishing a new quality benchmark.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI Disinformation | Risk | 54.0 |
Cached Content Preview
HTTP 200Fetched Mar 20, 20264 KB
# Synthesizing Obama: Learning Lip Sync from Audio
SIGGRAPH 2017
[Supasorn Suwajanakorn](https://homes.cs.washington.edu/~supasorn/), [Steven M. Seitz](http://homes.cs.washington.edu/~seitz/), [Ira Kemelmacher-Shlizerman](https://homes.cs.washington.edu/~kemelmi/)

Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.
## Supplementary Video
Synthesizing Obama: Learning Lip Sync from Audio - YouTube
Tap to unmute
[Synthesizing Obama: Learning Lip Sync from Audio](https://www.youtube.com/watch?v=9Yq67CjDqvw) [Supasorn Suwajanakorn](https://www.youtube.com/channel/UC4lctFoe7Bn6BhQfGddE5Tw)

Supasorn Suwajanakorn2.51K subscribers
[Watch on](https://www.youtube.com/watch?v=9Yq67CjDqvw)
## Publication
[SIGGRAPH 2017 Paper](https://grail.cs.washington.edu/projects/AudioToObama/siggraph17_obama.pdf)
## Training Videos
A list of youtube videos used for training our recurrent neural network:
[obama\_addresses.txt](https://grail.cs.washington.edu/projects/AudioToObama/obama_addresses.txt)
### Video A - Teaser
Input Audio: [nIxM8rL5GVE](https://www.youtube.com/watch?v=nIxM8rL5GVE&t=10s) (0:10 - 1:16)
Target Video: [3vPdtajOJfw](https://www.youtube.com/watch?v=3vPdtajOJfw)
### Video B - Comparison to face2face \[Thies et al. 2016\]
Input Audio: [nIxM8rL5GVE](https://www.youtube.com/watch?v=nIxM8rL5GVE&t=10s) (0:10 - 0:25)
Target Video: [k4OZOTaf3lk](https://www.youtube.com/watch?v=k4OZOTaf3lk)
### Video C - Method Pipeline
Input Audio: [deF-f0OqvQ4](https://youtu.be/deF-f0OqvQ4?t=1m37s) (1:37 - 2:14)
Target Video: [25GOnaY8ZCY](https://www.youtube.com/watch?v=25GOnaY8ZCY)
### Video D - Target Video Retiming
Input Audio: [nIxM8rL5GVE](https://www.youtube.com/watch?v=nIxM8rL5GVE&t=2m2s) (2:02 - 2:23)
Target Video: [25GOnaY8ZCY](https://www.youtube.com/watch?v=25GOnaY8ZCY)
### Video E - Weekly Address Speech (4-Obama)
Input Audio 1: [nIxM8rL5GVE](https://www.youtube.com/watch?v=nIxM8rL5GVE&t=3m53s) (3:53 - 4:20)
Input Audio 2: [WtOhZ--YeFY](https://www.youtube.com/watch?v=WtOhZ--YeFY&t=58s) (0:58 - 1:31)
**Target Videos:**
Top-Left: [k4OZOTaf3lk](https://www.youtube.com/watch?v=k4OZOTaf3lk)
Top-Right: [E3gfMumXCjI](https://www.youtube.com/watch?v=E3gfMumXCjI)
Bottom-Left: [3vPdtajOJfw](https://www.youtube.com/watch?v=3vPdtajOJfw)
Botto
... (truncated, 4 KB total)Resource ID:
62e052ee54819423 | Stable ID: ZGE4ZjRiMz