AIModels.fyi

AIModels.fyi

Share this post

AIModels.fyi
AIModels.fyi
Teaching AI to tell visually consistent stories

Teaching AI to tell visually consistent stories

A new architecture for videos that actually make sense

aimodels-fyi's avatar
aimodels-fyi
Dec 24, 2024
∙ Paid
2

Share this post

AIModels.fyi
AIModels.fyi
Teaching AI to tell visually consistent stories
1
Share

We understand stories differently than we understand moments. A moment can be striking or beautiful on its own - a sunset, a dancer's leap, a smile. But stories work by building relationships between moments. Each scene has to flow naturally from the ones before it. Characters need to stay consistent. Actions need to have consequences that persist.

This difference between moments and stories points to one of the hardest problems in artificial intelligence. Current AI systems can generate remarkable individual video clips: faces speaking, people dancing, animals moving. But these systems fail when asked to generate anything longer. The character's face subtly changes between scenes. The movements become jarring and unnatural. The story falls apart.

A lot of us assumed this was simply a matter of scale - that with bigger models and more training data, AI would naturally progress from generating moments to generating stories. But one of the top papers on AImodels.fyi today shows how the gap between moments and stories requires fundamental innovations in how AI systems work.

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 AIModels.fyi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share