AIModels.fyi

AIModels.fyi

Share this post

AIModels.fyi
AIModels.fyi
SmolDocling: An Ultra-Compact VLM for Document Understanding
Copy link
Facebook
Email
Notes
More

SmolDocling: An Ultra-Compact VLM for Document Understanding

Featuring DocTags for document markup!

aimodels-fyi's avatar
aimodels-fyi
Mar 25, 2025
∙ Paid
2

Share this post

AIModels.fyi
AIModels.fyi
SmolDocling: An Ultra-Compact VLM for Document Understanding
Copy link
Facebook
Email
Notes
More
1
Share

SmolDocling seems to be a significant advancement in compact document understanding models. This 256M parameter vision-language model is designed for efficient document processing while maintaining high performance across a range of document understanding tasks. Developed by researchers from IBM Research and HuggingFace, this model bridges the gap between large, resource-intensive models and more specialized ensemble approaches. I also like the name because it sounds like “Smol Duckling” and it’s nice to get a cute model name every once in a while.

Ahem…

Architecture and Design

Refer to caption
“Figure 1:SmolDocling/SmolVLM architecture. SmolDocling converts images of document pages to DocTags sequences. First, input images are encoded using a vision encoder and reshaped via projection and pooling. Then, the projected embeddings are concatenated with the text embeddings of the user prompt, possibly with interleaving. Finally, the sequence is used by an LLM to autoregressively predict the DocTags sequence.”

SmolDocling is built upon the SmolVLM architecture approach, specifically using the SmolVLM-256M variant. It consists of a SigLIP base patch-16/512 (93M) visual backbone and a lightweight variant of the SmolLM-2 family (135M) language backbone. This makes it between 5 and 10 times smaller in parameters than comparable vision-language models, and up to 27 times smaller than some models it outperforms.

Keep reading with a 7-day free trial

Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 AIModels.fyi
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More