AIModels.fyi

AIModels.fyi

Share this post

AIModels.fyi
AIModels.fyi
How EMO turns audio into a realistic talking head

How EMO turns audio into a realistic talking head

Turning audio into expressive talking portraits

aimodels-fyi's avatar
aimodels-fyi
Feb 28, 2024
∙ Paid
1

Share this post

AIModels.fyi
AIModels.fyi
How EMO turns audio into a realistic talking head
6
1
Share

The ability to create realistic synthetic talking head videos from a single image and audio has a lot of crazy potential in the world of AI. While major strides have been made with computer graphics and 3D modeling, generating fully authentic and expressive human facial animations from audio and a picture alone remains an elusive challenge.

However, a newly released paper has, I think, redefined what’s possible in this space. The implementation, called EMO, demonstrates that AI-based techniques can produce remarkably vivid talking head videos that capture the nuances of human speech and even singing.

In this article, we’ll see how it works and what you can create with it. Let’s begin!

AIModels.fyi is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Subscribe or follow me on Twitter for more detailed breakdowns of technical papers!

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 AIModels.fyi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share