Attention mechanism
The [blog post by D. Britz](/doc/? (WildML) gives a good and simple explanation. More details in [Attention? Attention!](/doc/? While simple Seq2Seq builds a single context vector out of the encoder’s last hidden state, attention creates shortcuts between the context vector and the entire source input: the context vector has access to the entire input sequence. The decoder can “attend” to different parts of the source sentence at each step of the output generation, and the model learns what to attend to based on the input sentence and what it has produced so far. Possible to interpret what the model is doing by looking at the Attention weight matrix Cost: We need to calculate an attention value for each combination of input and output word (-> attention is a bit of a misnomer: we look at everything in details before deciding what to focus on)
6 Documents (Long List