[1602.01137] A Dual Embedding Space Model for Document Ranking (2016)(About) Investigate neural word embeddings as a source of evidence in document ranking.
Presented in [this Stanford course on IR](/doc/?uri=https%3A%2F%2Fweb.stanford.edu%2Fclass%2Fcs276%2Fhandouts%2Flecture20-distributed-representations.pdf) by Chris Manning (starting slide 44)
They train a word2vec model, but retain both the input and the output projections.
> During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.
> However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives
Information Retrieval as Statistical Translation (Adam Berger , John Lafferty, 1999)(About) > "**Turn the search problem around to predict the input**"
> We propose a new probabilistic approach to information retrieval based upon the ideas and methods of statistical machine translation. The central ingredient in this approach is **a statistical model of how a user might distill or "translate" a given document into a query**. To assess the relevance of a document to a user's query, **we estimate the probability that the query would have been generated as a translation of the document**, and factor in the user's general preferences in the form of a prior distribution over documents. We propose a simple, well motivated model of the document-to-query translation process, and describe an algorithm for learning the parameters of this model in an unsupervised manner from a collection of documents