Semanlink - RL from Human Feedback AND ChatGPT: training

Shayne Longpre sur Twitter : "A 🧵 on @OpenAI LLM "Alignment" (e.g. #ChatGPT)..."

Tags:

2023-02-27 About

Prompting, Instruction Finetuning, and RLHF (CS224N)

Tags:

2023-02-16 About

Tags:

> There turned out to be a phase shift somewhere between 60B parameters and 175B parameters, that made language models super impressive.

> **The performance of current days language models are not obtained by language modeling**
>
>    - [Traditional] LMs are not [grounded](tag:grounded_language_learning)
> 
> **3 conceptual steps between GPT-3 and chatGPT: Instructions, code, RLHF.** The last one is, I think, the least interesting despite getting the most attention
>
> Instruction tuning: For example, the human annotators would write something like "please summarize this text", followed by some text they got, followed by a summary they produced of this text. -> Some symbols ("summarize", "translate", "formal") are used in a consistent way together with the concept/task they denote. And they always appear in the beginning of the text. -> the act of producing a summary grounded to the human concept of "summary"
>
> code: programming language code data, and specifically data that contains both natural language instructions or descriptions (in the form of code comments) and the corresponding programming language code. This produced another very direct form of grounding. the human language describes concepts (or intents), which are then realized in the form of the corresponding programs.
>
> "[RL with Human Feedback](tag:reinforcement_learning_from_human_feedback)". This is a fancy way of saying that the model now observes two humans in a conversation, one playing the role of a user, and another playing the role of "the AI", demonstrating how the AI should respond in different situations. This clearly helps the model learn how dialogs work, and how to keep track of information across dialog states (something that is very hard to learn from just "found" data).

2023-01-03 About

Tanishq Mathew Abraham sur Twitter : "Are you wondering how large language models like ChatGPT and InstructGPT actually work? One of the secret ingredients is RLHF... Let's dive into how RLHF works in 8 tweets!" / Twitter

Tags:

2022-12-28 About