Under the hood: Multilingual embeddings | Engineering Blog | Facebook Code(About) With this technique, embeddings for every language exist in the same vector space, and maintain the property that words with similar meanings (regardless of language) are close together in vector space
> To train these multilingual word embeddings, we first trained separate embeddings for each language using fastText and a combination of data from Facebook and Wikipedia. We then used dictionaries to project each of these embedding spaces into a common space (English). The dictionaries are automatically induced from parallel data — meaning data sets that consist of a pair of sentences in two different languages that have the same meaning — which we use for training translation systems.
Semantic hashing using tags and topic modeling (2013)(About) Semantic Hashing using Tags and Topic Modeling, to incorporate both the tag information and the similarity information from probabilistic topic modeling. [Comments about the paper](https://sutheeblog.wordpress.com/2016/10/28/paper-reading-semantic-hashing-using-tags-and-topic-modeling-sigir13/). [Code on Github](https://github.com/zhuoxiongzhao/code-for-SHTTM)
Poincaré Embeddings for Learning Hierarchical Representations(About) > While complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space
Bag of Tricks for Efficient Text Classification (arxiv) 2016(About) A simple and efficient baseline for text classification.
**Our word features can
be averaged** together to form good sentence representations.
Our experiments show that fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU, and classify half a million sentences among~312K classes in less than a minute.