Researchers discover in-context learning creates task vectors in LLMs
The researchers show that ICL could be broken down into two distinct steps occurring within the neural network.
In-context learning (ICL) has emerged as an intriguing capability of large language models such as GPT-4 and Llama. With just a few examples or demonstrations, these models can learn new tasks or concepts and apply them to new inputs. For instance, given two examples mapping countries to their capitals, the model can infer and output the capital for a new country it has not seen before.
This ability to quickly learn and adapt from limited data makes ICL highly promising. However, how exactly ICL works internally within the large neural network models has remained unclear. Uncovering the mechanisms underlying this phenomenon is key to understanding, controlling, and advancing this powerful AI technique.
Now, new research from scientists at Tel Aviv University and Google DeepMind reveals valuable insights into the workings of in-context learning in LLMs. Their findings provide evidence that ICL operates by creating a "task vector" that captures the essence of the examples provided. This post explains the key points from their paper.
Keep reading with a 7-day free trial
Subscribe to AIModels.fyi to keep reading this post and get 7 days of free access to the full post archives.