Semanlink - [2305.11778] Cross-Lingual Supervision improves Large Language Models Pre-training

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Andrea Schioppa
sl:arxiv_num : 2305.11778
sl:arxiv_published : 2023-05-19T16:14:07Z
sl:arxiv_summary : The recent rapid progress in pre-training Large Language Models has relied on using self-supervised language modeling objectives like next token prediction or span corruption. On the other hand, Machine Translation Systems are mostly trained using cross-lingual supervision that requires aligned data between source and target languages. We demonstrate that pre-training Large Language Models on a mixture of a self-supervised Language Modeling objective and the supervised Machine Translation objective, therefore including cross-lingual parallel data during pre-training, yields models with better in-context learning abilities. As pre-training is a very resource-intensive process and a grid search on the best mixing ratio between the two objectives is prohibitively expensive, we propose a simple yet effective strategy to learn it during pre-training.@en
sl:arxiv_title : Cross-Lingual Supervision improves Large Language Models Pre-training@en
sl:arxiv_updated : 2023-05-19T16:14:07Z
sl:bookmarkOf : https://arxiv.org/abs/2305.11778
sl:creationDate : 2023-05-22
sl:creationTime : 2023-05-22T08:13:33Z

File info

Documents with similar tags (experimental)

Tags:

2022-05-12 About