LLMs can speak in JPEG

By studying compressed images, LLMs can learn to "speak" in pictures

Aug 22, 2024

∙ Paid

Can large language models learn to "speak" image and video file formats? A new study explores an intriguing approach to visual generation using LLMs to model canonical codecs like JPEG and H.264.

In this post, we'll dive deep into how researchers trained 7B-parameter models to generate images and videos by outputting compressed file bytes, and how this a…

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.