Semanlink - Zeyuan Allen-Zhu sur X : " surprisingly, when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times!"

Printer friendly

Search Tag:

Search Doc:

Preferences...

Zeyuan Allen-Zhu sur X : " surprisingly, when pre-training good data (e.g., Wiki) together with "junks" (e.g., Common Crawl), LLM's capacity on good data may decrease by 20x times!"

Tags:

About This Document

sl:bookmarkOf : https://twitter.com/ZeyuanAllenZhu/status/1777513028466188404
sl:creationDate : 2024-04-10
sl:creationTime : 2024-04-10T18:33:05Z

File info

Bookmark of: https://twitter.com/ZeyuanAllenZhu/status/1777513028466188404