How EMO turns audio into a realistic talking head

Turning audio into expressive talking portraits

Feb 28, 2024

∙ Paid

The ability to create realistic synthetic talking head videos from a single image and audio has a lot of crazy potential in the world of AI. While major strides have been made with computer graphics and 3D modeling, generating fully authentic and expressive human facial animations from audio and a picture alone remains an elusive challenge.

However, a newly released paper has, I think, redefined what’s possible in this space. The implementation, called EMO, demonstrates that AI-based techniques can produce remarkably vivid talking head videos that capture the nuances of human speech and even singing.

In this article, we’ll see how it works and what you can create with it. Let’s begin!

Subscribe or follow me on Twitter for more detailed breakdowns of technical papers!

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.