Researchers discover explicit registers eliminate vision transformer attention spikes
When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. Here's how they fixed them.
Transformers have become the model architecture of choice for many vision tasks. Vision Transformers (ViTs) are especially popular. They apply the transformer directly to sequences of image patches. ViTs now match or exceed CNNs on benchmarks like image classification.
However, researchers from Met…
Keep reading with a 7-day free trial
Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.