About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Sanjeev Arora
- sl:arxiv_num : 2307.15936
- sl:arxiv_published : 2023-07-29T09:22:54Z
- sl:arxiv_summary : A major driver of AI products today is the fact that new skills emerge in
language models when their parameter set and training corpora are scaled up.
This phenomenon is poorly understood, and a mechanistic explanation via
mathematical analysis of gradient-based training seems difficult. The current
paper takes a different approach, analysing emergence using the famous (and
empirical) Scaling Laws of LLMs and a simple statistical framework.
Contributions include: (a) A statistical framework that relates cross-entropy
loss of LLMs to competence on the basic skills that underlie language tasks.
(b) Mathematical analysis showing that the Scaling Laws imply a strong form of
inductive bias that allows the pre-trained model to learn very efficiently. We
informally call this {\em slingshot generalization} since naively viewed it
appears to give competence levels at skills that violate usual generalization
theory. (c) A key example of slingshot generalization, that competence at
executing tasks involving $k$-tuples of skills emerges essentially at the same
scaling and same rate as competence on the elementary skills themselves.@en
- sl:arxiv_title : A Theory for Emergence of Complex Skills in Language Models@en
- sl:arxiv_updated : 2023-11-06T00:36:24Z
- sl:bookmarkOf : https://arxiv.org/abs/2307.15936
- sl:creationDate : 2024-02-24
- sl:creationTime : 2024-02-24T00:11:29Z
- sl:relatedDoc : http://www.semanlink.net/doc/2024/02/new_theory_suggests_chatbots_ca
Documents with similar tags (experimental)