ChatGPT: training ; Instruction tuning AND RL from Human Feedback
Common descendants