Semanlink - [2203.09435] Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation

Printer friendly

Search Tag:

Search Doc:

Preferences...

[2203.09435] Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Xinyi Wang
sl:arxiv_num : 2203.09435
sl:arxiv_published : 2022-03-17T16:48:22Z
sl:arxiv_summary : The performance of multilingual pretrained models is highly dependent on the availability of monolingual or parallel text present in a target language. Thus, the majority of the world's languages cannot benefit from recent progress in NLP as they have no or limited textual data. To expand possibilities of using NLP technology in these under-represented languages, we systematically study strategies that relax the reliance on conventional language resources through the use of bilingual lexicons, an alternative resource with much better language coverage. We analyze different strategies to synthesize textual or labeled data using lexicons, and how this data can be combined with monolingual or parallel text when available. For 19 under-represented languages across 3 tasks, our methods lead to consistent improvements of up to 5 and 15 points with and without extra monolingual text respectively. Overall, our study highlights how NLP methods can be adapted to thousands more languages that are under-served by current technology@en
sl:arxiv_title : Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation@en
sl:arxiv_updated : 2022-04-06T12:47:10Z
sl:bookmarkOf : https://arxiv.org/abs/2203.09435
sl:creationDate : 2022-09-08
sl:creationTime : 2022-09-08T11:17:10Z

File info

Bookmark of: https://arxiv.org/abs/2203.09435

Documents with similar tags (experimental)

[2311.11077] Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Tags:

2023-11-25 About

[2305.06897] AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Tags:

2023-05-15 About

[2302.11529] Modular Deep Learning

Tags:

2023-02-23 About

[2205.11498] Domain Adaptation for Memory-Efficient Dense Retrieval

Tags:

2022-09-26 About

[2205.03983] Building Machine Translation Systems for the Next Thousand Languages

Tags:

2022-05-10 About

[2105.00828] Memorisation versus Generalisation in Pre-trained Language Models

Tags:

2022-03-30 About

[2004.09095] The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Tags:

2021-10-03 About

[2106.04647] Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Tags:

2021-09-29 About

[2010.12566] DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

Tags:

2021-09-06 About

The 4 Biggest Open Problems in NLP (2019)

Tags:

2021-08-26 About

[2010.02353] Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Tags:

2021-08-25 About

[2107.12708] QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Tags:

2021-08-06 About

[2107.00676] A Primer on Pretrained Multilingual Language Models

Tags:

2021-07-13 About

[2010.12309] A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Tags:

2021-07-06 About

[2006.07264] Low-resource Languages: A Review of Past Work and Future Challenges

Tags:

2021-07-06 About

Why You Should Do NLP Beyond English

Tags:

2020-08-01 About

[2004.14958] A Call for More Rigor in Unsupervised Cross-lingual Learning

Tags:

2020-05-02 About

[1811.06031] A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

Tags:

2018-11-17 About

[1706.04902] A Survey Of Cross-lingual Word Embedding Models

Tags:

2018-05-20 About

[1801.06146] Universal Language Model Fine-tuning for Text Classification

Tags:

2018-01-19 About