Emergence
http://www.semanlink.net/tag/emergence
Documents tagged with Emergence[2307.15936] A Theory for Emergence of Complex Skills in Language Models
http://www.semanlink.net/doc/2024/02/2307_15936_a_theory_for_emerg
[New Theory Suggests Chatbots Can Understand Text | Quanta Magazine](doc:2024/02/new_theory_suggests_chatbots_ca)
2024-02-24T00:11:29ZNew Theory Suggests Chatbots Can Understand Text | Quanta Magazine
http://www.semanlink.net/doc/2024/02/new_theory_suggests_chatbots_ca
Article on (i) theory of emergence of complex skills in LLMs (ii) SKILL-MIX eval -- shows LLMs able to use skills combos not seen during training. ([Arora](tag:sanjeev_arora))
> “Stochastic parrots” generate text only by combining information they have already seen, not through any understanding of their own. Are ChatGPT, Bard and other large chatbots simply parroting their training data? The answer is probably no.
[[2307.15936] A Theory for Emergence of Complex Skills in Language Models](doc:2024/02/2307_15936_a_theory_for_emerg)
2024-02-11T09:12:09ZLenka Zdeborova sur X : "Emergence in LLMs is a mystery. Emergence in physics is linked to phase transitions. We identify a phase transition between semantic and positional learning in a toy model of dot-product attention"
http://www.semanlink.net/doc/2024/02/lenka_zdeborova_sur_x_emerge
2024-02-07T22:19:57ZPhysics of AI - YouTube
http://www.semanlink.net/doc/2023/03/physics_of_ai_youtube
- Intelligence has emerged: why? how?
- Let's study this with *controlled experiments* and *toy models*
- Clean and clear insights that peer slightly behind the magic curtain
emergence; rise of scale. Phase transition: fct(compute time for training: nb of params, dataset size)
(12:25) Transformers: the jump when compared to traditional NN: instead on oparating on single input X, operates on a set of inputs. (cf. training to say whether an image contains 2 elements of the same kind). **The aptitude to compare elements in the input sequence enables analogies** - the essence of reasoning.(Classical NN treats one high dimensional vector, tranformers hanlde sets of vectors. Attention layer replaces learned filters W of classical NN by the other elements in the sequence. Attention module compares part of input against other part of input. "Relative machines rather than absolute". Sets are a very powerful level of abstraction)
emergence
> The truth is that nobody has a clue what is going on"
>
> "**Something unknown is doing we don't know what**" (A. Eddington)
being inspired by physics' methodology: controled experiments and toy mathematical models
2023-03-28T23:46:52ZAndrew Lampinen sur Twitter : "What is emergence, and why is it of recent interest in AI... A thread: 1/" / Twitter
http://www.semanlink.net/doc/2023/02/andrew_lampinen_sur_twitter_
2023-02-26T01:41:58ZCharacterizing Emergent Phenomena in Large Language Models – Google AI Blog
http://www.semanlink.net/doc/2023/01/characterizing_emergent_phenome
[Tweet](https://twitter.com/_jasonwei/status/1618331876623523844?s=20&t=sMbTCnu16Od8vGBmo0x6ig)
> unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.
2023-01-26T09:28:43ZMatthew Honnibal sur Twitter : "Some of the things ChatGPT can do are emergent behaviours... Other things it can do have been specifically taught to it. Is there some speculative list somewhere about what tasks were supervised?" / Twitter
http://www.semanlink.net/doc/2023/01/matthew_honnibal_sur_twitter_
2023-01-14T16:40:29ZSome remarks on Large Language Models
http://www.semanlink.net/doc/2023/01/some_remarks_on_large_language_
> There turned out to be a phase shift somewhere between 60B parameters and 175B parameters, that made language models super impressive.
> **The performance of current days language models are not obtained by language modeling**
>
> - [Traditional] LMs are not [grounded](tag:grounded_language_learning)
>
> **3 conceptual steps between GPT-3 and chatGPT: Instructions, code, RLHF.** The last one is, I think, the least interesting despite getting the most attention
>
> Instruction tuning: For example, the human annotators would write something like "please summarize this text", followed by some text they got, followed by a summary they produced of this text. -> Some symbols ("summarize", "translate", "formal") are used in a consistent way together with the concept/task they denote. And they always appear in the beginning of the text. -> the act of producing a summary grounded to the human concept of "summary"
>
> code: programming language code data, and specifically data that contains both natural language instructions or descriptions (in the form of code comments) and the corresponding programming language code. This produced another very direct form of grounding. the human language describes concepts (or intents), which are then realized in the form of the corresponding programs.
>
> "[RL with Human Feedback](tag:reinforcement_learning_from_human_feedback)". This is a fancy way of saying that the model now observes two humans in a conversation, one playing the role of a user, and another playing the role of "the AI", demonstrating how the AI should respond in different situations. This clearly helps the model learn how dialogs work, and how to keep track of information across dialog states (something that is very hard to learn from just "found" data).
2023-01-03T09:15:16ZUnveiling Transformers with LEGO - YouTube
http://www.semanlink.net/doc/2022/06/unveiling_transformers_with_leg
> To me, what's good about transformers is that they have relative filters. I mean **a standard NN tests an input against a fixed filter w, but here we test part of x against another part of x**. (#[Self-Attention](tag:self_attention))
>
> This potentially allows for reasonning to emerge: the network can associate concepts that it encounters, compare them, make analogies
> LEGO: Learning Equality and Group Operations. It's a very **basic reasoning task**, where a sentence is made of clauses defining variables as a function of some other variable, and the goal is to **resolve the value of the variables**.
2022-06-30T14:21:53ZFrom Random Grammars to Learning Language - Département de Physique de l'Ecole Normale supérieure
http://www.semanlink.net/doc/2020/09/from_random_grammars_to_learnin
2020-09-17T23:46:39ZLe langage, une émergence explosive
http://www.semanlink.net/doc/2020/09/le_langage_une_emergence_explo
Le physicien Eric DeGiuli a récemment proposé un modèle statistique d’apprentissage qui renouvelle la linguistique générative
[Papier](doc:2020/09/from_random_grammars_to_learnin)
2020-09-17T23:42:25Z