SIF embeddings
"Smoothed Inverse Frequency": a linear representation of a sentence which is better than the simple average of the embeddings of its words 2 ideas: - assign to each word a weighting that depends on the frequency of the word it the corpus (reminiscent of TF-IDF) - some denoising (removing the component from the top singular direction) Todo (?): check implementation as a [sklearn Vectorizer](
Related Tags:
6 Documents (Long List