ChatGPT: training ; RL from Human Feedback AND cs224n
Common descendants