RL from Human Feedback AND cs224n
Common descendants