Semanlink - [2203.14465] STaR: Bootstrapping Reasoning With Reasoning

[2203.14465] STaR: Bootstrapping Reasoning With Reasoning

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Eric Zelikman
sl:arxiv_num : 2203.14465
sl:arxiv_published : 2022-03-28T03:12:15Z
sl:arxiv_summary : Generating step-by-step \"chain-of-thought\" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the \"Self-Taught Reasoner\" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.@en
sl:arxiv_title : STaR: Bootstrapping Reasoning With Reasoning@en
sl:arxiv_updated : 2022-05-20T13:52:54Z
sl:bookmarkOf : https://arxiv.org/abs/2203.14465
sl:creationDate : 2023-02-07
sl:creationTime : 2023-02-07T16:40:38Z

File info

Bookmark of: https://arxiv.org/abs/2203.14465

Documents with similar tags (experimental)

[2404.11018] Many-Shot In-Context Learning

Tags:

2024-04-21 About

[2404.03592] ReFT: Representation Finetuning for Language Models

Tags:

2024-04-08 About

[2307.15936] A Theory for Emergence of Complex Skills in Language Models

Tags:

2024-02-24 About

[2401.18059] RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Tags:

2024-02-03 About

[2306.04640] ModuleFormer: Modularity Emerges from Mixture-of-Experts

Tags:

2023-09-16 About

[2307.08621] Retentive Network: A Successor to Transformer for Large Language Models

Tags:

2023-07-20 About

[2305.14128] Dr.ICL: Demonstration-Retrieved In-context Learning

Tags:

2023-07-14 About

[2305.12517] Retrieving Texts based on Abstract Descriptions

Tags:

2023-06-15 About

[2305.11778] Cross-Lingual Supervision improves Large Language Models Pre-training

Tags:

2023-05-22 About

[2305.06897] AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Tags:

2023-05-15 About

Google "We Have No Moat, And Neither Does OpenAI"

Tags:

> low-cost public involvement was enabled
by a vastly cheaper mechanism for fine tuning called low
rank adaptation ()[LoRA](tag:lora)

> **Part of what makes LoRA so effective is that ... it’s stackable.**
>
> By contrast, training giant models from scratch not only
throws away the pretraining, but also any iterative
improvements that have been made on top.

> LoRA updates are very cheap to produce (~$100) for the
most popular model sizes.

> Many of these projects are saving time by training on
small, highly curated datasets...
> These
datasets are built using synthetic methods (e.g. filtering
the best responses from an existing model) and
scavenging from other projects

> Directly Competing With Open Source
Is a Losing Proposition

> Paradoxically, the one clear winner in all of this is Meta.
Because the leaked model was theirs ([LLaMA](tag:llama)), they have
effectively garnered an entire planet's worth of free labor.
Since most open source innovation is happening on top of
their architecture, there is nothing stopping them from
directly incorporating it into their products.

2023-05-04 About

[2303.16839] MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Tags:

2023-04-25 About

[2304.09848] Evaluating Verifiability in Generative Search Engines

Tags:

2023-04-23 About

[2304.01982] Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Tags:

2023-04-05 About

[2303.17651] Self-Refine: Iterative Refinement with Self-Feedback

Tags:

2023-04-03 About

[2112.05682] Self-attention Does Not Need O(n^2) Memory

Tags:

2023-02-27 About

[2302.08091] Do We Still Need Clinical Language Models?

Tags:

2023-02-17 About

The Flan Collection: Advancing open source methods for instruction tuning – Google AI Blog

Tags:

2023-02-02 About

An empirical analysis of compute-optimal large language model training

Tags:

2023-01-26 About

[2301.04709] Causal Abstraction for Faithful Model Interpretation

Tags:

2023-01-14 About

[2212.01340] Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Tags:

2022-12-06 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2210.09338] Deep Bidirectional Language-Knowledge Graph Pretraining

Tags:

2022-10-23 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2206.10658] Questions Are All You Need to Train a Dense Passage Retriever

Tags:

> **approach for training dense retrieval models that does not require any labeled training data**. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples.
>
> ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents).
>
> It uses a new document-retrieval autoencoding scheme, where
> 1. an input question is used to retrieve a set of evidence documents, and
> 2. the documents are then used to compute the probability of reconstructing the original question.
>
> Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning.

[Tweet](doc:2022/07/devendra_singh_sachan_sur_twitt)

> Given an
input question, ART first retrieves a small set
of possible evidences documents. It then recon
structs
the original question by attending to these
documents
>
> The
key idea in ART is to consider the retrieved documents
as a noisy representation of the original
question and question reconstruction probability
as a way of denoising that provides soft-labels for
how likely each document is to have been the correct
result

Refers to [[IZACARD 2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering](doc:2020/12/2012_04584_distilling_knowled)

2022-07-06 About

[2205.08184] SKILL: Structured Knowledge Infusion for Large Language Models

Tags:

2022-05-18 About

[2205.05131] Unifying Language Learning Paradigms

Tags:

2022-05-12 About

[2204.08173] TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Tags:

2022-05-11 About

[2203.08913] Memorizing Transformers

Tags:

2022-05-07 About

[2006.05987] Revisiting Few-sample BERT Fine-tuning

Tags:

2022-03-21 About

[1906.00300] Latent Retrieval for Weakly Supervised Open Domain Question Answering

Tags:

2022-01-11 About

[2104.12016] Learning Passage Impacts for Inverted Indexes

Tags:

2021-10-08 About

[2106.04647] Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Tags:

2021-09-29 About

[2010.12566] DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

Tags:

2021-09-06 About

[2102.07043] Reasoning Over Virtual Knowledge Bases With Open Predicate Relations

Tags:

> a method for constructing **a virtual KB (VKB) trained entirely from text**

Open Predicate Query Language (OPQL): constructing a virtual knowledge base (VKB) that supports KB reasoning & open-domain QA, tackling the incompleteness of knowledge bases by constructing a virtual KB only from text

> OPQL constructs
a VKB by **encoding and indexing a set of
relation mentions** in a way that naturally enables
reasoning and can be trained without any structured
supervision.

> can be used
as an **external memory integrated into a language
model**

cf. this earlier paper [[2002.10640] Differentiable Reasoning over a Virtual Knowledge Base](doc:2020/07/2002_10640_differentiable_rea). But does not require an initial structured KB for distant
supervision.

> The key idea in constructing the OPQL VKB is to use a
dual-encoder pre-training process, similar to 
[[1906.03158] Matching the Blanks: Distributional Similarity for Relation Learning](doc:2021/05/1906_03158_matching_the_blank)

Related work section refers to [[1909.04164] Knowledge Enhanced Contextual Word Representations](doc:2020/05/1909_04164_knowledge_enhanced). Also refers to [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_) (some authors in common)

2021-06-20 About

[1906.03158] Matching the Blanks: Distributional Similarity for Relation Learning

Tags:

2021-05-13 About

[1909.10506] Learning Dense Representations for Entity Retrieval

Tags:

> We show that it is feasible to perform **entity
linking by training a dual encoder (two-tower)
model that encodes mentions and entities in
the same dense vector space**, where candidate
entities are retrieved by approximate nearest
neighbor search. Unlike prior work, **this setup
does not rely on an alias table followed by a
re-ranker, and is thus the first fully learned entity
retrieval model**.

Contributions:

> -  a dual encoder architecture for
learning entity and mention encodings suitable for
retrieval. A key feature of the architecture is that it
employs a modular **hierarchy of sub-encoders that
capture different aspects of mentions and entities**
> - a simple, fully unsupervised **hard negative
mining** strategy that produces massive gains
in retrieval performance, compared to using only
random negatives
> - high
quality candidate entities very efficiently using approximate nearest neighbor search
> - outperforms discrete retrieval
baselines like an alias table or BM25

> strong retrieval
performance across all 5.7 million Wikipedia entities in
around 3ms per mention

> since we are using a two-tower or dual
encoder architecture, **our model cannot use any kind of attention over
both mentions and entities at once**, nor feature-wise
comparisons as done by Francis-Landau et al. (2016).
This is a fairly severe constraint – for example, **we cannot
directly compare the mention span to the entity title**
– but it permits retrieval with nearest neighbor search
for the entire context against a single, all encompassing
representation for each entity

2021-05-01 About

[1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

2021-04-11 About

[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training

Tags:

**Augment language model pre-training with a retriever module**, which
is trained using the masked language modeling objective.

> To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. **For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner**, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents

Hum, #TODO: parallel to be drawn with techniques in [KG-augmented Language Models](tag:knowledge_graph_augmented_language_models) which focus "on the problem of capturing declarative knowledge in the learned parameters of a language model."

[Google AI Blog Post](doc:2020/08/google_ai_blog_realm_integrat)

[Summary](https://joeddav.github.io/blog/2020/03/03/REALM.html) for the [Hugging Face awesome-papers reading group](doc:2021/03/huggingface_awesome_papers_pap)

2020-12-12 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge

Tags:

> a neural language model that includes **an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.**... **The model can be updated without re-training by manipulating its symbolic representations**. In particular this model allows us to add new facts and overwrite existing ones.

> a **neural language model which learns to access information
in a symbolic knowledge graph.**

> This
model builds on the recently-proposed [Entities as
Experts](doc:2020/07/2004_07202_entities_as_expert) (EaE) language model (Févry et al., 2020),
which extends the same transformer (Vaswani
et al., 2017) architecture of BERT (Devlin et al., 2019) with an additional external memory for entities.
>
> After training EaE, the embedding associated
with an entity will (ideally) capture information
about the textual context in which that
entity appears, and by inference, the entity’s semantic
properties
>
> we include an additional
memory called a fact memory, which encodes
triples from a symbolic KB.
>
> This combination results in a
neural language model which learns to access information
in a the symbolic knowledge graph.

TODO:

- read again IBM's [Span Selection Pre-training for Question Answering](doc:2019/09/_1909_04120_span_selection_pre) ("an effort to avoid encoding general knowledge in the transformer network itself")
- compare with [[1907.05242] Large Memory Layers with Product Keys](doc:2019/07/_1907_05242_large_memory_layer)
- how does it relate with [[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)?

2020-07-09 About

[2006.09462] Selective Question Answering under Domain Shift

Tags:

2020-06-30 About

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

2020-02-17 About

[1911.00172] Generalization through Memorization: Nearest Neighbor Language Models

Tags:

2019-12-20 About

[1906.02715] Visualizing and Measuring the Geometry of BERT

Tags:

2019-06-07 About

[1803.02893] An efficient framework for learning sentence representations

Tags:

2019-03-20 About