Semanlink - [2309.06131] Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

[2309.06131] Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Sophia Althammer
sl:arxiv_num : 2309.06131
sl:arxiv_published : 2023-09-12T11:17:42Z
sl:arxiv_summary : Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset. We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has the most positive effect on the rankers. This way, it would be possible to fine-tune effective PLM rankers at a reduced annotation budget. To investigate this, we adapt existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers and investigate their effectiveness, also considering annotation and computational costs. Our extensive analysis shows that AL strategies do not significantly outperform random selection of training subsets in terms of effectiveness. We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost. Our results highlight that ``optimal'' subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.@en
sl:arxiv_title : Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection@en
sl:arxiv_updated : 2023-09-12T11:17:42Z
sl:bookmarkOf : https://arxiv.org/abs/2309.06131
sl:creationDate : 2023-09-14
sl:creationTime : 2023-09-14T00:47:05Z

File info

Bookmark of: https://arxiv.org/abs/2309.06131

Linked From

Omar Khattab sur X : "This isn't the main point of this great new paper by @sophiaalthammer et al. But it's incredible how ColBERT at 1000 training queries is better than DPR trained at *50,000* queries!"

Tags:

2023-09-14 About

Documents with similar tags (experimental)

[2310.03025] Retrieval meets Long Context Large Language Models

Tags:

2023-10-07 About

SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval

Tags:

2023-07-26 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2104.07186] COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

Tags:

2023-03-08 About

[2302.08091] Do We Still Need Clinical Language Models?

Tags:

2023-02-17 About

[2302.04870] Offsite-Tuning: Transfer Learning without Full Model

Tags:

2023-02-11 About

[2206.02743] A Neural Corpus Indexer for Document Retrieval

Tags:

2023-01-18 About

[2205.12410] AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Tags:

2022-12-16 About

[2008.07267] A Survey of Active Learning for Text Classification using Deep Neural Networks

Tags:

2022-09-06 About

[2209.00099] Efficient Methods for Natural Language Processing: A Survey

Tags:

2022-09-04 About

[1807.00745] Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Tags:

2022-07-18 About

[2207.06300] Re2G: Retrieve, Rerank, Generate

Tags:

2022-07-14 About

[2205.00820] Entity-aware Transformers for Entity Search

Tags:

2022-07-12 About

[2204.08173] TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Tags:

2022-05-11 About

[2012.12624] Learning Dense Representations of Phrases at Scale

Tags:

2022-05-11 About

[2205.03983] Building Machine Translation Systems for the Next Thousand Languages

Tags:

2022-05-10 About

[2202.10054] Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Tags:

2022-05-01 About

[2204.08491] Active Learning Helps Pretrained Models Learn the Intended Task

Tags:

2022-04-20 About

[1910.06294] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models

Tags:

2022-03-31 About

[2004.05119] Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer

Tags:

2022-03-31 About

[2108.13934] Robust Retrieval Augmented Generation for Zero-shot Slot Filling

Tags:

2022-01-19 About

[2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Tags:

2022-01-19 About

[2010.02666] Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

Tags:

2021-12-16 About

[1911.02655] Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks

Tags:

2021-11-19 About

[2108.13854] Contrastive Domain Adaptation for Question Answering using Limited Text Corpora

Tags:

2021-11-19 About

[1706.03610] Neural Domain Adaptation for Biomedical Question Answering

Tags:

2021-11-19 About

[2109.08133] Phrase Retrieval Learns Passage Retrieval, Too

Tags:

2021-09-30 About

[2010.12309] A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Tags:

2021-07-06 About

[2101.00345] Modeling Fine-Grained Entity Types with Box Embeddings

Tags:

2021-06-22 About

[2106.04098] Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model

Tags:

2021-06-16 About

[2009.12030] AutoETER: Automated Entity Type Representation for Knowledge Graph Embedding

Tags:

2021-05-17 About

[1911.09419] Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Tags:

2021-05-17 About

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

2021-01-12 About

[1911.06136] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Tags:

2020-11-03 About

[2010.03496] Inductive Entity Representations from Text via Link Prediction

Tags:

BLP "BERT for Link Prediction". Central idea: **training an entity encoder with a
link prediction objective** (using the textual descriptions of entities when computing entity representations - hence not failing with entities unknown in training)

> a method for **learning representations
of entities**, that uses a **pre-trained Transformer** based
architecture as an entity encoder, and
**link prediction training on a knowledge graph
with textual entity descriptions**.

> using entity descriptions,
an entity encoder is trained for link prediction in
a knowledge graph. The encoder can then be used
without fine-tuning to obtain features for entity classification
and information retrieval

Cites [Xie et al](doc:2020/10/representation_learning_of_know) and [Kepler](doc:2020/11/1911_06136_kepler_a_unified_). They claim that their
objective targeted exclusively for link prediction (and not an objective that combines language modeling
and link prediction as Kepler)
performs better than Kepler's more complex one.

2020-11-03 About

Tags:

2020-07-02 About

[2001.09522] TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Tags:

how to add a set of new concepts to an existing taxonomy.

[Tweet](https://twitter.com/mickeyjs6/status/1253772146142216194?s=20) [GitHub](https://github.com/mickeystroller/TaxoExpan)

> we study the taxonomy expansion task: given an
existing taxonomy and a set of new emerging concepts, we aim
to automatically expand the taxonomy to incorporate these new
concepts (without changing the existing relations in the given taxonomy).

> To the best of our knowledge, this is the first study on **how to
expand an existing directed acyclic graph (as we model a taxonomy
as a DAG) using self-supervised learning**.

Self-supervised framework, the existing taxonomy being used as training data: it learns a model to predict whether a query concept is the direct hyponym of an anchor concept.

> 2 techniques:
>
> 1. a **position-enhanced graph neural network that encodes the local structure of an anchor concept** in the existing taxonomy,
> 2. a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.

Regarding 1: uses [GNN](/tag/graph_neural_networks.html) to model the "ego network" of concepts (potential “siblings”
and “grand parents” of the query concept).

> Regular
GNNs fail to distinguish nodes with different relative positions to
the query (i.e., some nodes are grand parents of the query while
the others are siblings of the query). To address this limitation, we
present a simple but effective enhancement to inject such position
information into GNNs using position embedding. We show that
such embedding can be easily integrated with existing GNN architectures
(e.g., [GCN](/tag/graph_convolutional_networks) and GAT) and significantly boosts the
prediction performance

Regarding point 2: uses InfoNCE loss, cf. [Contrastive Predictive Coding](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1807.03748)

> Instead of predicting
whether each individual ⟨query concept, anchor concept⟩ pair
is positive or not, we first group all pairs sharing the same query
concept into a single training instance and learn a model to select
the positive pair among other negative ones from the group.

(Hum, ça me rappelle quelque chose)

> assume each concept (in existing taxonomy + set of new concepts) has an initial embedding
vector learned from some text associated with this concept.

To keep things tractable, only attempts to find a single parent node of each new concept.

2020-04-25 About

[2004.06842] Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph

Tags:

2020-04-17 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

[1912.03927] Large deviations for the perceptron model and consequences for active learning

Tags:

2019-12-11 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About