> Deep contextualized word representations
each word is assigned a representation which is a function of the
entire corpus sentences to which they belong. The embeddings are
computed from the internal states of a two-layers bidirectional Language
Model, hence the name “ELMo”: Embeddings from Language
[1806.06259] Evaluation of sentence embeddings in downstream and linguistic probing tasks(About) a simple approach using bag-of-words with a recently introduced language model for deep context-dependent word embeddings proved to yield better results in many tasks when compared to sentence encoders trained on entailment datasets
> We also show, however, that we are still far away from a universal encoder that can perform consistently across several downstream tasks.
ELMo: Deep contextualized word representations (2018)(About) > models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).
> These word vectors are learned functions of the internal states of a deep bidirectional language model (biLM)
These representations are :
- Contextual: The representation for each word depends on the entire context in which it is used.
- Deep: combine all layers of a deep pre-trained neural network.
- Character based