[1806.06259] Evaluation of sentence embeddings in downstream and linguistic probing tasks(About) a simple approach using bag-of-words with a recently introduced language model for deep context-dependent word embeddings proved to yield better results in many tasks when compared to sentence encoders trained on entailment datasets
> We also show, however, that we are still far away from a universal encoder that can perform consistently across several downstream tasks.
ELMo: Deep contextualized word representations (2018)(About) > models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).
> These word vectors are learned functions of the internal states of a deep bidirectional language model (biLM)
These representations are :
- Contextual: The representation for each word depends on the entire context in which it is used.
- Deep: combine all layers of a deep pre-trained neural network.
- Character based
Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models | Blog | Explosion AI(About) > A four-step strategy for deep learning with text
> Word embeddings let you treat individual words as related units of meaning, rather than entirely distinct IDs. However, most NLP problems require understanding of longer spans of text, not just individual words. There's now a simple and flexible solution that is achieving excellent performance on a wide range of problems. After embedding the text into a sequence of vectors, bidirectional RNNs are used to encode the vectors into a sentence matrix. The rows of this matrix can be understood as token vectors — they are sensitive to the sentential context of the token. The final piece of the puzzle is called an attention mechanism. This lets you reduce the sentence matrix down to a sentence vector, ready for prediction.