A Tri-Partite Neural Document Language Model for Semantic Information Retrieval (2018 - ESWC conference)(About) from the abstract: Previous work in information retrieval have shown that using evidence, such as concepts and relations, from external knowledge sources could enhance the retrieval performance... This paper presents a new tri-partite neural document language framework that leverages explicit knowledge to jointly constrain word, concept, and document learning representations to tackle a number of issues including polysemy and granularity mismatch.
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical Information Retrieval (2017)(About) > Transferring the success of word embeddings to Information Retrieval (IR) task is currently an active research topic. While embedding-based retrieval models could tackle the vocabulary mismatch problem by making use of the embedding’s inherent similarity between distinct words, most of them struggle to compete with the prevalent strong baselines such as TF-IDF and BM25.
Considering a practical ad-hoc IR task composed of two steps, matching and scoring, compares the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents (namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as a novel inverse document frequency (IDF) re-weighted word centroid similarity).
> We confirm that word embeddings can be successfully employed in a practical information retrieval setting. The proposed cosine similarity of IDF re-weighted, aggregated word vectors is competitive to the TF-IDF baseline.
Vectorland: Brief Notes from Using Text Embeddings for Search(About) > the elegance is in the learning model, but the magic is in the structure of the information we model
> The source-target training pairs dictate **what notion of "relatedness"** will be modeled in the embedding space
> is Eminem more similar to Rihanna or rap?