Word embeddings in 2017: Trends and future directions - Subword-level embeddings: several methods:
> Word embeddings have been augmented with subword-level information for many applications such as named entity recognition, POS, ..., Language Modeling.
> Most of these models employ a CNN or a BiLSTM that takes as input the characters of a word and outputs a character-based word representation.
> For incorporating character information into pre-trained embeddings, however, **character n-grams features** have been shown to be more powerful. [#FastText]
> Subword units based on **byte-pair encoding** have been found to be particularly useful for machine translation where they have replaced words as the standard input units
- Out-of-vocabulary (OOV) words
- Polysemy. Multi-sense embeddings
- [Towards a Seamless Integration of Word Senses into Downstream NLP Applications](http://aclweb.org/anthology/P17-1170)
An overview of word embeddings and their connection to distributional semantic models - AYLIEN (2016) > While on the surface DSMs and word embedding models use varying algorithms to learn word representations – the former count, the latter predict – both types of model fundamentally act on the same underlying statistics of the data, i.e. the co-occurrence counts between words...
> These results are in contrast to the general consensus that word embeddings are superior to traditional methods. Rather, they indicate that it typically makes no difference whatsoever whether word embeddings or distributional methods are used. What really matters is that your hyperparameters are tuned and that you utilize the appropriate pre-processing and post-processing steps.