Word Mover's Embedding: From Word2Vec to Document Embedding (2018)(About) unsupervised embeddings of sentences of variable length from pre-trained word embeddings (better on short length text).
(Builds on the word mover's distance, but using ideas borrowed from kernel methods approximation, gets a representation of sentences, instead of just a distance between them)
[1810.00438] Zero-training Sentence Embedding via Orthogonal Basis (2018)(About) **training-free approach for building sentence representations**, "Geometric Embedding" (GEM), based on the **geometric structure** of word embedding space.
> we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence. **We model the semantic meaning of a word in a sentence** based on two aspects. One is its relatedness to the word vector subspace already spanned by its contextual words. The other is the word’s novel semantic meaning which shall be introduced as a new basis vector perpendicular to this existing subspace
[Open Revieww](/doc/?uri=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3DrJedbn0ctQ) ; [Related to this paper](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.05358)
[1704.05358] Representing Sentences as Low-Rank Subspaces (2017)(About) > We observe a simple geometry of sentences -- the word representations of a given sentence roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this observation, we represent a sentence by the low-rank subspace spanned by its word vectors.
A sentence of N words is a matrix (300, N) (if 300 is the dim of the word embeddings space). We take the eg. 4 (hyperparam) heaviest singular values -> a subspace with dim 4
Similarity between docs: principal angle between the subspaces (reminiscent of cosine similarity)
[1806.06259] Evaluation of sentence embeddings in downstream and linguistic probing tasks(About) a simple approach using bag-of-words with a recently introduced language model for deep context-dependent word embeddings proved to yield better results in many tasks when compared to sentence encoders trained on entailment datasets
> We also show, however, that we are still far away from a universal encoder that can perform consistently across several downstream tasks.
The Current Best of Universal Word Embeddings and Sentence Embeddings (2018)(About) Word embeddings SOTA: [ELMo](/tag/elmo)
Sentence embeddings: While unsupervised representation learning of sentences had been the
norm for quite some time, with simple baselines like averaging word embeddings, a few novel unsupervised and supervised
approaches, as well as multi-task learning schemes, have emerged in late
[1803.11175] Universal Sentence Encoder (2018)(About) models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks.
> With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task
mixes an unsupervised task using a large corpus together with the supervised SNLI task, leveraging the [#Transformer](/tag/attention_is_all_you_need) architecture
A Simple but Tough-to-Beat Baseline for Sentence Embeddings (2017)(About) > Use word embeddings computed using one of the popular methods on unlabeled corpus like Wikipedia, represent the sentence by a weighted average of the word vectors, and then modify them a bit using PCA/SVD
See also [youtube: Sanjeev Arora on "A theoretical approach to semantic representations"](https://www.youtube.com/watch?v=KR46z_V0BVw)
Bag of Tricks for Efficient Text Classification (arxiv) 2016(About) A simple and efficient baseline for text classification.
**Our word features can
be averaged** together to form good sentence representations.
Our experiments show that fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU, and classify half a million sentences among~312K classes in less than a minute.