Word2Bits - Quantized Word Vectors (2018)(About) We show that high quality quantized word vectors using 1-2 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer
An overview of word embeddings and their connection to distributional semantic models - AYLIEN (2016)(About) > While on the surface DSMs and word embedding models use varying algorithms to learn word representations – the former count, the latter predict – both types of model fundamentally act on the same underlying statistics of the data, i.e. the co-occurrence counts between words...
> These results are in contrast to the general consensus that word embeddings are superior to traditional methods. Rather, they indicate that it typically makes no difference whatsoever whether word embeddings or distributional methods are used. What really matters is that your hyperparameters are tuned and that you utilize the appropriate pre-processing and post-processing steps.
Distributed Representations of Sentences and Documents (arxiv 2014)(About) Paragraph Vector: an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents.Represents each document by a dense vector which is trained to predict words in the document. Overcomes the weaknesses of the [Bag Of Words](/tag/bag_of_words) model (order of words, semantic of words)
An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec(About) Types of word embeddings:
- Frequency based Embedding
- Count Vector
- TF-IDF Vector
- Co-Occurrence Vector
- Co_occurence matrix (with a fixed context window), size V*V or V * N (Vocab size * subset of V size) matrix.
- PCA or SVD: keeping the k most important eigenvalues
- Prediction based Embedding
- CBOW (Continuous Bag Of Words). 1 hidden layer, one output layer. Predict the probability of a word given a context
- Skip-gram. Predict the proba of the context given a word
Sample code using gensim
Text Classification With Word2Vec - DS lore(About) > Overall, we won’t be throwing away our SVMs any time soon in favor of word2vec but it has it’s place in text classification.
> 1. SVM’s are pretty great at text classification tasks
> 2. Models based on simple averaging of word-vectors can be surprisingly good too (given how much information is lost in taking the average)
> 3. but they only seem to have a clear advantage when there is ridiculously little labeled training data
> Update 2017: actually, the best way to utilise the pretrained embeddings would probably be this [using keras](https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html)
Sample code to benchmark a few text categorization models to test whehter word embeddings like word2vec can improve text classification accuracy.
Sample code (based on scikit-learn) includes an embedding vectorizer that is given embedding dataset and vectorizes texts by taking the mean of all the vectors corresponding to individual words.
Efficient Estimation of Word Representations in Vector Space (2013)(About) We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.