Σχετικά με το έγγραφο αυτό
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Nikita Kitaev
- sl:arxiv_num : 2001.04451
- sl:arxiv_published : 2020-01-13T18:38:28Z
- sl:arxiv_summary : Large Transformer models routinely achieve state-of-the-art results on a
number of tasks but training these models can be prohibitively costly,
especially on long sequences. We introduce two techniques to improve the
efficiency of Transformers. For one, we replace dot-product attention by one
that uses locality-sensitive hashing, changing its complexity from O($L^2$) to
O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use
reversible residual layers instead of the standard residuals, which allows
storing activations only once in the training process instead of $N$ times,
where $N$ is the number of layers. The resulting model, the Reformer, performs
on par with Transformer models while being much more memory-efficient and much
faster on long sequences.@en
- sl:arxiv_title : Reformer: The Efficient Transformer@en
- sl:arxiv_updated : 2020-02-18T16:01:18Z
- sl:bookmarkOf : https://arxiv.org/abs/2001.04451
- sl:creationDate : 2020-06-29
- sl:creationTime : 2020-06-29T19:04:03Z