ChatGPT: training ; Instruction tuning ; RL from Human Feedback AND RL from Human Feedback
Common descendants