Semanlink - [2206.02743] A Neural Corpus Indexer for Document Retrieval

Tags:

About This Document

sl:arxiv_author :
- Zheng Liu
- Hao Allen Sun
- Qi Chen
- Yuqing Xia
- Hao Sun
- Chengmin Chi
- Qi Zhang
- Yujing Wang
- Yingyan Hou
- Weiwei Deng
- Mao Yang
- Haonan Wang
- Xing Xie
- Shibin Wu
- Guoshuai Zhao
- Ziming Miao
sl:arxiv_firstAuthor : Yujing Wang
sl:arxiv_num : 2206.02743
sl:arxiv_published : 2022-06-06T16:56:52Z
sl:arxiv_summary : Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +17.6% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.@en
sl:arxiv_title : A Neural Corpus Indexer for Document Retrieval@en
sl:arxiv_updated : 2022-10-14T03:03:52Z
sl:bookmarkOf : https://arxiv.org/abs/2206.02743
sl:creationDate : 2023-01-18
sl:creationTime : 2023-01-18T22:52:58Z

File info

Bookmark of: https://arxiv.org/abs/2206.02743

Documents with similar tags (experimental)

[2001.09522] TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Tags:

how to add a set of new concepts to an existing taxonomy.

[Tweet](https://twitter.com/mickeyjs6/status/1253772146142216194?s=20) [GitHub](https://github.com/mickeystroller/TaxoExpan)

> we study the taxonomy expansion task: given an
existing taxonomy and a set of new emerging concepts, we aim
to automatically expand the taxonomy to incorporate these new
concepts (without changing the existing relations in the given taxonomy).

> To the best of our knowledge, this is the first study on **how to
expand an existing directed acyclic graph (as we model a taxonomy
as a DAG) using self-supervised learning**.

Self-supervised framework, the existing taxonomy being used as training data: it learns a model to predict whether a query concept is the direct hyponym of an anchor concept.

> 2 techniques:
>
> 1. a **position-enhanced graph neural network that encodes the local structure of an anchor concept** in the existing taxonomy,
> 2. a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.

Regarding 1: uses [GNN](/tag/graph_neural_networks.html) to model the "ego network" of concepts (potential “siblings”
and “grand parents” of the query concept).

> Regular
GNNs fail to distinguish nodes with different relative positions to
the query (i.e., some nodes are grand parents of the query while
the others are siblings of the query). To address this limitation, we
present a simple but effective enhancement to inject such position
information into GNNs using position embedding. We show that
such embedding can be easily integrated with existing GNN architectures
(e.g., [GCN](/tag/graph_convolutional_networks) and GAT) and significantly boosts the
prediction performance

Regarding point 2: uses InfoNCE loss, cf. [Contrastive Predictive Coding](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1807.03748)

> Instead of predicting
whether each individual ⟨query concept, anchor concept⟩ pair
is positive or not, we first group all pairs sharing the same query
concept into a single training instance and learn a model to select
the positive pair among other negative ones from the group.

(Hum, ça me rappelle quelque chose)

> assume each concept (in existing taxonomy + set of new concepts) has an initial embedding
vector learned from some text associated with this concept.

To keep things tractable, only attempts to find a single parent node of each new concept.

2020-04-25 About