Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch
Source: MarkTechPost The Transformer’s attention mechanism has barely changed since 2017. Most efficiency work has tried to replace...