Semanlink - Embeddings in Information Retrieval

> Transferring the success of word embeddings to Information Retrieval (IR) task is currently an active research topic. While embedding-based retrieval models could tackle the vocabulary mismatch problem by making use of the embedding’s inherent similarity between distinct words, most of them struggle to compete with the prevalent strong baselines such as TF-IDF and BM25.

Considering a practical ad-hoc IR task composed of two steps, matching and scoring, compares the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents (namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as a novel inverse document frequency (IDF) re-weighted word centroid similarity).

> We confirm that word embeddings can be successfully employed in a practical information retrieval setting. The proposed cosine similarity of IDF re-weighted, aggregated word vectors is competitive to the TF-IDF baseline.

2018-01-28 About

Distributed Word Representations for Information Retrieval

Tags:

2017-10-01 About

Vectorland: Brief Notes from Using Text Embeddings for Search

Tags:

2017-09-18 About

Using Text Embeddings for Information Retrieval

Tags:

2017-09-18 About

Properties

sl:creationDate : 2018-01-28
sl:creationTime : 2018-01-28T17:20:14Z
rdf:type : sl:Tag
skos:altLabel : Embeddings in IR@en
skos:prefLabel : Embeddings in Information Retrieval@en