Semanlink - [2305.14128] Dr.ICL: Demonstration-Retrieved In-context Learning

[2305.14128] Dr.ICL: Demonstration-Retrieved In-context Learning

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Man Luo
sl:arxiv_num : 2305.14128
sl:arxiv_published : 2023-05-23T14:55:25Z
sl:arxiv_summary : In-context learning (ICL), teaching a large language model (LLM) to perform a task with few-shot demonstrations rather than adjusting the model parameters, has emerged as a strong paradigm for using LLMs. While early studies primarily used a fixed or random set of demonstrations for all test queries, recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance. This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations. Furthermore, we extend the success of retrieval-based ICL to instruction-finetuned LLMs as well as Chain-of-Thought (CoT) prompting. For instruction-finetuned LLMs, we find that although a model has already seen the training data at training time, retrieving demonstrations from the training data at test time yields better results compared to using no demonstrations or random demonstrations. Last but not least, we train a task-specific demonstration retriever that outperforms off-the-shelf retrievers.@en
sl:arxiv_title : Dr.ICL: Demonstration-Retrieved In-context Learning@en
sl:arxiv_updated : 2023-05-23T14:55:25Z
sl:bookmarkOf : https://arxiv.org/abs/2305.14128
sl:creationDate : 2023-07-14
sl:creationTime : 2023-07-14T12:25:23Z

File info

Bookmark of: https://arxiv.org/abs/2305.14128

Documents with similar tags (experimental)

[2404.11018] Many-Shot In-Context Learning

Tags:

2024-04-21 About

[2307.15936] A Theory for Emergence of Complex Skills in Language Models

Tags:

2024-02-24 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2305.11778] Cross-Lingual Supervision improves Large Language Models Pre-training

Tags:

2023-05-22 About

[2305.06897] AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Tags:

2023-05-15 About

[2303.16839] MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Tags:

2023-04-25 About

[2304.01982] Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Tags:

2023-04-05 About

[2112.05682] Self-attention Does Not Need O(n^2) Memory

Tags:

2023-02-27 About

[2203.14465] STaR: Bootstrapping Reasoning With Reasoning

Tags:

2023-02-07 About

[2205.05638] Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Tags:

2022-12-15 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2201.04337] PromptBERT: Improving BERT Sentence Embeddings with Prompts

Tags:

2022-09-16 About

[2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners

Tags:

2022-09-07 About

[2206.10658] Questions Are All You Need to Train a Dense Passage Retriever

Tags:

> **approach for training dense retrieval models that does not require any labeled training data**. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples.
>
> ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents).
>
> It uses a new document-retrieval autoencoding scheme, where
> 1. an input question is used to retrieve a set of evidence documents, and
> 2. the documents are then used to compute the probability of reconstructing the original question.
>
> Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning.

[Tweet](doc:2022/07/devendra_singh_sachan_sur_twitt)

> Given an
input question, ART first retrieves a small set
of possible evidences documents. It then recon
structs
the original question by attending to these
documents
>
> The
key idea in ART is to consider the retrieved documents
as a noisy representation of the original
question and question reconstruction probability
as a way of denoising that provides soft-labels for
how likely each document is to have been the correct
result

Refers to [[IZACARD 2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering](doc:2020/12/2012_04584_distilling_knowled)

2022-07-06 About

[2205.08184] SKILL: Structured Knowledge Infusion for Large Language Models

Tags:

2022-05-18 About

[2205.05131] Unifying Language Learning Paradigms

Tags:

2022-05-12 About

[2203.08913] Memorizing Transformers

Tags:

2022-05-07 About

[1906.00300] Latent Retrieval for Weakly Supervised Open Domain Question Answering

Tags:

2022-01-11 About

[2106.04647] Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Tags:

2021-09-29 About

[2010.12566] DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

Tags:

2021-09-06 About

[2102.07043] Reasoning Over Virtual Knowledge Bases With Open Predicate Relations

Tags:

> a method for constructing **a virtual KB (VKB) trained entirely from text**

Open Predicate Query Language (OPQL): constructing a virtual knowledge base (VKB) that supports KB reasoning & open-domain QA, tackling the incompleteness of knowledge bases by constructing a virtual KB only from text

> OPQL constructs
a VKB by **encoding and indexing a set of
relation mentions** in a way that naturally enables
reasoning and can be trained without any structured
supervision.

> can be used
as an **external memory integrated into a language
model**

cf. this earlier paper [[2002.10640] Differentiable Reasoning over a Virtual Knowledge Base](doc:2020/07/2002_10640_differentiable_rea). But does not require an initial structured KB for distant
supervision.

> The key idea in constructing the OPQL VKB is to use a
dual-encoder pre-training process, similar to 
[[1906.03158] Matching the Blanks: Distributional Similarity for Relation Learning](doc:2021/05/1906_03158_matching_the_blank)

Related work section refers to [[1909.04164] Knowledge Enhanced Contextual Word Representations](doc:2020/05/1909_04164_knowledge_enhanced). Also refers to [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_) (some authors in common)

2021-06-20 About

[1906.03158] Matching the Blanks: Distributional Similarity for Relation Learning

Tags:

2021-05-13 About

[1909.10506] Learning Dense Representations for Entity Retrieval

Tags:

> We show that it is feasible to perform **entity
linking by training a dual encoder (two-tower)
model that encodes mentions and entities in
the same dense vector space**, where candidate
entities are retrieved by approximate nearest
neighbor search. Unlike prior work, **this setup
does not rely on an alias table followed by a
re-ranker, and is thus the first fully learned entity
retrieval model**.

Contributions:

> -  a dual encoder architecture for
learning entity and mention encodings suitable for
retrieval. A key feature of the architecture is that it
employs a modular **hierarchy of sub-encoders that
capture different aspects of mentions and entities**
> - a simple, fully unsupervised **hard negative
mining** strategy that produces massive gains
in retrieval performance, compared to using only
random negatives
> - high
quality candidate entities very efficiently using approximate nearest neighbor search
> - outperforms discrete retrieval
baselines like an alias table or BM25

> strong retrieval
performance across all 5.7 million Wikipedia entities in
around 3ms per mention

> since we are using a two-tower or dual
encoder architecture, **our model cannot use any kind of attention over
both mentions and entities at once**, nor feature-wise
comparisons as done by Francis-Landau et al. (2016).
This is a fairly severe constraint – for example, **we cannot
directly compare the mention span to the entity title**
– but it permits retrieval with nearest neighbor search
for the entire context against a single, all encompassing
representation for each entity

2021-05-01 About

[1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

2021-04-11 About

[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training

Tags:

**Augment language model pre-training with a retriever module**, which
is trained using the masked language modeling objective.

> To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. **For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner**, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents

Hum, #TODO: parallel to be drawn with techniques in [KG-augmented Language Models](tag:knowledge_graph_augmented_language_models) which focus "on the problem of capturing declarative knowledge in the learned parameters of a language model."

[Google AI Blog Post](doc:2020/08/google_ai_blog_realm_integrat)

[Summary](https://joeddav.github.io/blog/2020/03/03/REALM.html) for the [Hugging Face awesome-papers reading group](doc:2021/03/huggingface_awesome_papers_pap)

2020-12-12 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge

Tags:

> a neural language model that includes **an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.**... **The model can be updated without re-training by manipulating its symbolic representations**. In particular this model allows us to add new facts and overwrite existing ones.

> a **neural language model which learns to access information
in a symbolic knowledge graph.**

> This
model builds on the recently-proposed [Entities as
Experts](doc:2020/07/2004_07202_entities_as_expert) (EaE) language model (Févry et al., 2020),
which extends the same transformer (Vaswani
et al., 2017) architecture of BERT (Devlin et al., 2019) with an additional external memory for entities.
>
> After training EaE, the embedding associated
with an entity will (ideally) capture information
about the textual context in which that
entity appears, and by inference, the entity’s semantic
properties
>
> we include an additional
memory called a fact memory, which encodes
triples from a symbolic KB.
>
> This combination results in a
neural language model which learns to access information
in a the symbolic knowledge graph.

TODO:

- read again IBM's [Span Selection Pre-training for Question Answering](doc:2019/09/_1909_04120_span_selection_pre) ("an effort to avoid encoding general knowledge in the transformer network itself")
- compare with [[1907.05242] Large Memory Layers with Product Keys](doc:2019/07/_1907_05242_large_memory_layer)
- how does it relate with [[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)?

2020-07-09 About

[1906.02715] Visualizing and Measuring the Geometry of BERT

Tags:

2019-06-07 About

[1803.02893] An efficient framework for learning sentence representations

Tags:

2019-03-20 About