"Comparison between the vanilla Transformer (top) and the proposed iTransformer (bottom). Unlike Transformer, which embeds each time step to the temporal token, iTransformer embeds the whole series independently to the variate token, such that multivariate correlations can be depicted by the attention mechanism and series representations are encoded by …
Continue reading this post for free, courtesy of aimodels-fyi.