Zeyuan Allen-Zhu sur X : " surprisingly, when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times!"
Tags:
Au sujet de ce document
Infos sur le fichier