Matteo Pagliardini sur X : "#DenseFormer, a simple modification that performs—after each transformer block—a weighted average of past representations."
Tags:
About This Document
File info