RL from Human Feedback ; Tweet AND Fine-tuning
Common descendants