Zeyuan Allen-Zhu sur X : " surprisingly, when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times!"
Tags:
About This Document
File info