RL from Human Feedback AND Fine-tuning
Common descendants