Can large language models learn to "speak" image and video file formats? A new study explores an intriguing approach to visual generation using LLMs to model canonical codecs like JPEG and H.264.
In this post, we'll dive deep into how researchers trained 7B-parameter models to generate images and videos by outputting compressed file bytes, and how this a…
Keep reading with a 7-day free trial
Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.