[1704.05358] Representing Sentences as Low-Rank Subspaces (2017)(About) > We observe a simple geometry of sentences -- the word representations of a given sentence roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this observation, we represent a sentence by the low-rank subspace spanned by its word vectors.
A sentence of N words is a matrix (300, N) (if 300 is the dim of the word embeddings space). We take the eg. 4 (hyperparam) heaviest singular values -> a subspace with dim 4
Similarity between docs: principal angle between the subspaces (reminiscent of cosine similarity)