Home Page FastText
Classifier on top of a sentence2vec model. Main idea: the morphological structure of a word carries important information about the meaning of the word, which is not taken into account by traditional [word embeddings](/tag/word_embedding). This is especially significant for morphologically rich languages (German, Turkish) in which a single word can have a large number of morphological forms, each of which might occur rarely, thus making it hard to train good word embeddings. FastText attempts to solve this by treating each word as the aggregation of its subwords (uses character n-grams as features -> avoids the OOV (out of vocabulary) problem) (FastText represents words as the sum of their n-gram representations trained with a skip-gram model) Embeddings learned using FastText (trained on wikipedia) are available in [many languages](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md)
Related Tags:
13 Documents (Long List