Sanjeev Arora sur X : "It's better to use just 5% of the instruction-tuning data (suitably selected) instead of the full dataset."
Tags:
About This Document
File info