Word embeddings in 2017: Trends and future directions
- Subword-level embeddings: several methods: > Word embeddings have been augmented with subword-level information for many applications such as named entity recognition, POS, ..., Language Modeling. > Most of these models employ a CNN or a BiLSTM that takes as input the characters of a word and outputs a character-based word representation. > For incorporating character information into pre-trained embeddings, however, **character n-grams features** have been shown to be more powerful. [#FastText] > Subword units based on **byte-pair encoding** have been found to be particularly useful for machine translation where they have replaced words as the standard input units - Out-of-vocabulary (OOV) words - Polysemy. Multi-sense embeddings - [Towards a Seamless Integration of Word Senses into Downstream NLP Applications](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1710.06632) - ...
About This Document
File info