Semanlink - [2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Shivam Garg
sl:arxiv_num : 2208.01066
sl:arxiv_published : 2022-08-01T18:01:40Z
sl:arxiv_summary : In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn \"most\" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions -- that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes -- namely sparse linear functions, two-layer neural networks, and decision trees -- with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .@en
sl:arxiv_title : What Can Transformers Learn In-Context? A Case Study of Simple Function Classes@en
sl:arxiv_updated : 2022-08-01T18:01:40Z
sl:bookmarkOf : https://arxiv.org/abs/2208.01066
sl:creationDate : 2022-09-17
sl:creationTime : 2022-09-17T13:45:05Z

File info

Bookmark of: https://arxiv.org/abs/2208.01066

Documents with similar tags (experimental)

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About