The Transformer ("Attention is All You Need")
[Vaswani, et al. 2017 paper](https://arxiv.org/abs/1706.03762) [#seq2seq](/tag/sequence_to_sequence_learning) using only improved self-attention units ("multi-head self-attention mechanism"), without any RNN.
Related Tags:
ExpandDescendants
7 Documents (Long List
Properties