Gensim tutorial: Similarity Queries(About) > "The thing to note here is that documents no. 2 would never be returned by a standard boolean fulltext search, because they do not share any common words with query string"
An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec(About) Types of word embeddings:
- Frequency based Embedding
- Count Vector
- TF-IDF Vector
- Co-Occurrence Vector
- Co_occurence matrix (with a fixed context window), size V*V or V * N (Vocab size * subset of V size) matrix.
- PCA or SVD: keeping the k most important eigenvalues
- Prediction based Embedding
- CBOW (Continuous Bag Of Words). 1 hidden layer, one output layer. Predict the probability of a word given a context
- Skip-gram. Predict the proba of the context given a word
Sample code using gensim