Alternative way AND RL from Human Feedback
Common descendants
1 Document
2023-05-18 About