Getting a consistent character across image generation runs

Maintaining visual coherence of the same character across different generated images

Nov 21, 2023

∙ Paid

The Chosen One: Generating Consistent Characters from Text Descriptions Using AI

Recent breakthroughs in AI-generated imagery via text-to-image models like DALL-E and Stable Diffusion have unlocked new creative possibilities. However, a persistent challenge is maintaining visual coherence of the same character across different generated images.

Researchers from Google, Hebrew University of Jerusalem, Tel Aviv University, and Reichman University proposed an automated solution to enable consistent character generation in a paper titled "The Chosen One: Consistent Characters in Text-to-Image Diffusion Models."

Let's see what they uncovered and how we can use it in our own work!

Standard vs. new approach — "Figure 2. Identity consistency. Given the prompt “a Plasticine of a cute baby cat with big eyes”, a standard text-to-image diffusion model produces different cats (all corresponding to the input text), whereas our method produces the same cat."

Subscribe or follow me on Twitter for more content like this!

The Context

Text-to-image models like DALL-E and Stable Diffusion contain deep neural networks trained on vast datasets to generate realistic images from text captions. They optimize text embeddings and image latents to align the generated image with the text description.

However, these models struggle to maintain consistency across multiple images of the same character, even when given similar text prompts. For instance, asking the model to generate "a white cat" may produce cats with differing fur patterns, colors, poses, etc. in each image.

This inability to preserve identity coherence poses challenges for applications like:

Illustrating stories or textbooks with recurring characters
Building unique brand personalities and mascots
Designing video game assets and virtual characters
Creating advertising campaigns with memorable spokespersons

Without consistency, characters change appearance unpredictably across images. Creators often resort to manual techniques like depicting the character in multiple poses first or carefully hand-picking the best results.

The Approach

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.