<?xml version='1.0' encoding='UTF-8'  ?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/">	<channel rdf:about="http://www.semanlink.net/tag/nlp_low_resource_scenarios">		<title>Low-Resource NLP</title>		<link>http://www.semanlink.net/tag/nlp_low_resource_scenarios</link>		<description>Documents tagged with Low-Resource NLP</description>		<items>			<rdf:Seq>							<rdf:li resource="http://www.semanlink.net/doc/2024/10/meta_ai_research_topic_no_lan"/>				<rdf:li resource="http://www.semanlink.net/doc/2023/09/2309_06131_annotating_data_fo"/>				<rdf:li resource="http://www.semanlink.net/doc/2023/08/do_large_language_models_work_o"/>				<rdf:li resource="http://www.semanlink.net/doc/2023/02/comparing_africa_centric_models"/>				<rdf:li resource="http://www.semanlink.net/doc/2023/02/towards_a_tagalog_nlp_pipeline"/>				<rdf:li resource="http://www.semanlink.net/doc/2023/01/colin_leong_sur_twitter_this"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/09/how_to_train_an_mt5_model_for_t"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/09/2203_09435_expanding_pretrain"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/09/2209_00099_efficient_methods_"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/08/allennlp_sur_twitter_dataset"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/07/1807_00745_training_a_neural_"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/07/dealing_with_data_scarcity_in_n"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/07/no_language_left_behind"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/06/acl_2022_highlights"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/05/isaac_r_caswell_sur_twitter_"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/05/2205_03983_building_machine_t"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/03/1910_06294_training_compact_m"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/03/2004_05119_beyond_fine_tuning"/>				<rdf:li resource="http://www.semanlink.net/doc/2022/03/domain_adaptation_of_word_embed"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/12/sebastian_ruder_sur_twitter_"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/12/2112_01488_colbertv2_effecti"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/11/1911_02655"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/11/2108_13854_contrastive_domain_1"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/11/1706_03610_neural_domain_adap"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/10/nils_reimers_sur_twitter_neu"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/10/neubig_lowresource_nlp_bootcamp"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/10/valar_nmt_vastly_lacking_resou"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/10/bigscience_research_workshop_su"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/10/linguistic_diversity"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/10/2004_09095_the_state_and_fate"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/09/2109_04513_filling_the_gaps_i"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/09/koren_lazar_sur_twitter_m"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/08/the_4_biggest_open_problems_in_"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/08/2010_02353_participatory_rese"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/07/cc_100_monolingual_datasets_fr"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/07/davlan_david_adelani_hugging"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/07/2107_00676_a_primer_on_pretra"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/07/2010_12309_a_survey_on_recent"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/07/2006_07264_low_resource_langu"/>				<rdf:li resource="http://www.semanlink.net/doc/2021/07/practical_natural_language_proc"/>				<rdf:li resource="http://www.semanlink.net/doc/2020/08/why_you_should_do_nlp_beyond_en"/>			</rdf:Seq>		</items>	</channel>		<item rdf:about="http://www.semanlink.net/doc/2024/10/meta_ai_research_topic_no_lan">		<title>Meta AI Research Topic - No Language Left Behind</title>		<link>http://www.semanlink.net/doc/2024/10/meta_ai_research_topic_no_lan</link>		<dc:date>2024-10-07T14:32:42Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2023/09/2309_06131_annotating_data_fo">		<title>[2309.06131&#93; Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection</title>		<link>http://www.semanlink.net/doc/2023/09/2309_06131_annotating_data_fo</link>		<description>compare les Sentence Transformers, les cross encoders et Colbert dans le cadre low resource

&gt; &quot;optimal&apos;&apos; subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.		</description>		<dc:date>2023-09-14T00:47:05Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2023/08/do_large_language_models_work_o">		<title>Do large language models work on Tagalog?</title>		<link>http://www.semanlink.net/doc/2023/08/do_large_language_models_work_o</link>		<description>how LLMs work on Tagalog data in structured prediction tasks?
&gt; tl;dr: you might get more bang for your buck training a supervised model!		</description>		<dc:date>2023-08-07T09:16:16Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2023/02/comparing_africa_centric_models">		<title>Comparing Africa-centric Models to OpenAI&apos;s GPT3.5 - Lelapa</title>		<link>http://www.semanlink.net/doc/2023/02/comparing_africa_centric_models</link>		<dc:date>2023-02-10T21:13:07Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2023/02/towards_a_tagalog_nlp_pipeline">		<title>Towards a Tagalog NLP pipeline</title>		<link>http://www.semanlink.net/doc/2023/02/towards_a_tagalog_nlp_pipeline</link>		<dc:date>2023-02-04T16:41:56Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2023/01/colin_leong_sur_twitter_this">		<title>Colin Leong sur Twitter : &quot;This book is about the only &quot;dataset&quot; I ever found for Hani. My first ever foray into the field, I found an electronic copy and munged it into a Hani/English parallel corpus, and trained a JoeyNMT model with the help of @MasakhaneNLP and @KreutzerJulia in particular.&quot; / Twitter</title>		<link>http://www.semanlink.net/doc/2023/01/colin_leong_sur_twitter_this</link>		<description>[joeynmt/joeynmt: Minimalist NMT for educational purposes&#93;(doc:2023/01/joeynmt_joeynmt_minimalist_nmt)		</description>		<dc:date>2023-01-05T13:34:03Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/09/how_to_train_an_mt5_model_for_t">		<title>How to Train an mT5 Model for Translation With Simple Transformers | by Thilina Rajapakse | Towards Data Science</title>		<link>http://www.semanlink.net/doc/2022/09/how_to_train_an_mt5_model_for_t</link>		<dc:date>2022-09-25T15:02:31Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/09/2203_09435_expanding_pretrain">		<title>[2203.09435&#93; Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation</title>		<link>http://www.semanlink.net/doc/2022/09/2203_09435_expanding_pretrain</link>		<dc:date>2022-09-08T11:17:10Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/09/2209_00099_efficient_methods_">		<title>[2209.00099&#93; Efficient Methods for Natural Language Processing: A Survey</title>		<link>http://www.semanlink.net/doc/2022/09/2209_00099_efficient_methods_</link>		<description>&gt; We thus structure this survey by following the typical
NLP model pipeline and present the existing
methods that aim to make the respective stage
more efficient.		</description>		<dc:date>2022-09-04T11:26:48Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/08/allennlp_sur_twitter_dataset">		<title>AllenNLP sur Twitter : &quot;Dataset: training data for @MetaAI &apos;s No Language Left Behind NLLB-200 models!...&quot;</title>		<link>http://www.semanlink.net/doc/2022/08/allennlp_sur_twitter_dataset</link>		<description>[No Language Left Behind&#93;(doc:2022/07/no_language_left_behind)		</description>		<dc:date>2022-08-25T21:26:55Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/07/1807_00745_training_a_neural_">		<title>[1807.00745&#93; Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data</title>		<link>http://www.semanlink.net/doc/2022/07/1807_00745_training_a_neural_</link>		<description>Automatically created labels can deteriorate
a classifier’s performance

&gt; approach to training
a neural network with **a combination of a small
amount of clean data and a larger set of automatically
annotated, noisy instances**
&gt;
&gt; We model the
noise explicitly using a **noise layer** that is added
to the network architecture. This allows us to directly
optimize the network weights using standard
techniques. After training, the noise layer
is not needed anymore, removing any added complexity.

[related blog post&#93;(https://www.roxanne-euproject.org/news/blog/making-natural-language-processing-work-for-little-training-data)		</description>		<dc:date>2022-07-18T11:39:48Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/07/dealing_with_data_scarcity_in_n">		<title>Dealing with Data Scarcity in Natural Language Processing | by Yves Peirsman | NLPTown | Medium 2019)</title>		<link>http://www.semanlink.net/doc/2022/07/dealing_with_data_scarcity_in_n</link>		<description>&gt; Snorkel’s process is as follows. First, a developer writes
labelling functions and evaluates them on a small set of
labelled training data. Snorkel allows us to evaluate the
accuracy and coverage of all our labelling functions, and
their overlaps and conflicts with each other. Next, it trains
a generative label model over these labelling functions
that learns how best to combine them. Finally, this label
model outputs probabilistic labels that we can use to train
an end model.		</description>		<dc:date>2022-07-18T11:06:41Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/07/no_language_left_behind">		<title>No Language Left Behind</title>		<link>http://www.semanlink.net/doc/2022/07/no_language_left_behind</link>		<description>[tweet&#93;(https://twitter.com/vedanujg/status/1544925973635690497?s=20&amp;t=ZunLNurhmN7aHDmnzPO5yQ)		</description>		<dc:date>2022-07-06T20:57:57Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/06/acl_2022_highlights">		<title>ACL 2022 Highlights</title>		<link>http://www.semanlink.net/doc/2022/06/acl_2022_highlights</link>		<dc:date>2022-06-07T17:58:34Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/05/isaac_r_caswell_sur_twitter_">		<title>Isaac R Caswell sur Twitter : &quot;How many languages can we support with Machine Translation?...&quot;</title>		<link>http://www.semanlink.net/doc/2022/05/isaac_r_caswell_sur_twitter_</link>		<description>&gt; We train a translation model on 1000+ languages, using it to launch 24 new languages on Google Translate without any parallel data for these languages...		</description>		<dc:date>2022-05-18T16:12:44Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/05/2205_03983_building_machine_t">		<title>[2205.03983&#93; Building Machine Translation Systems for the Next Thousand Languages</title>		<link>http://www.semanlink.net/doc/2022/05/2205_03983_building_machine_t</link>		<dc:date>2022-05-10T08:00:10Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/03/1910_06294_training_compact_m">		<title>[1910.06294&#93; Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models</title>		<link>http://www.semanlink.net/doc/2022/03/1910_06294_training_compact_m</link>		<dc:date>2022-03-31T21:06:23Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/03/2004_05119_beyond_fine_tuning">		<title>[2004.05119&#93; Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer</title>		<link>http://www.semanlink.net/doc/2022/03/2004_05119_beyond_fine_tuning</link>		<description>&gt; Fine-tuning (FT) pre-trained sentence embedding models on small datasets has been shown to have limitations. In this paper we show that concatenating the embeddings from the pre-trained model with those from a simple sentence embedding model trained only on the target data, can improve over the performance of FT for few-sample tasks		</description>		<dc:date>2022-03-31T21:04:02Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2022/03/domain_adaptation_of_word_embed">		<title>Domain adaptation of word embeddings through the exploitation of in-domain corpora and knowledge bases (PhD Thesis 2021)</title>		<link>http://www.semanlink.net/doc/2022/03/domain_adaptation_of_word_embed</link>		<description>Thèse de Hicham El Boukkouri, univ Paris-Saclay

[Github&#93;(https://github.com/helboukkouri/phd-code)

### Goal

Given a target specialized domain, improve the quality of general-domain
word representations using in-domain corpora and/or knowledge bases

### Contributions

#### a method for specializing general-domain embeddings in a [Low-Resource&#93;(tag:nlp_low_resource_scenarios) context.

&gt; - train static representations on the task corpus, 
&gt; - resume the
pre-training of general-domain contextual embeddings on the same task corpus,
&gt; - finally, combine both static and contextual representations into
one final model

#### we tackle the issue of using a general-domain vocabulary in a specialized domain

#### Evaluation of  re-training vs training from scratch on specialized corpora using a specialized vocabulary

training from scratch better, but not that much: re-training from a general model
is still appropriate as it is less expensive and leads to comparable, although
slightly lower, performance

#### Regarding subword-based tokenization systems
&gt; we argue that they are inconvenient in practice -&gt; CharacterBERT, a variant
of BERT that uses ELMo’s character-based system instead of WordPieces. More convenient ti use, superior robustness to misspellings

#### Ways to specialize general-domain representations using knowledge bases

a strong baseline using a simple
method relying on graph embeddings and concatenation, using only is_a relation

&gt; both static and contextual embeddings may effectively be
specialized using this simple approach

#### Knowledge Injection Modules (KIM) that
inject the knowledge representations directly within the BERT-like models&apos; architecture

### Notes

&gt; our experiments focused on a single
setting (i.e. the medical domain and the English language)

&gt; meta-embeddings, an approach that consists in combining
different sets of representations for achieving improved performance		</description>		<dc:date>2022-03-23T16:32:44Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/12/sebastian_ruder_sur_twitter_">		<title>Sebastian Ruder sur Twitter : &quot;Modular and Parameter-Efficient Fine-Tuning for NLP Models&quot;</title>		<link>http://www.semanlink.net/doc/2021/12/sebastian_ruder_sur_twitter_</link>		<dc:date>2021-12-17T11:45:32Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/12/2112_01488_colbertv2_effecti">		<title>[2112.01488&#93; ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction</title>		<link>http://www.semanlink.net/doc/2021/12/2112_01488_colbertv2_effecti</link>		<dc:date>2021-12-05T10:33:54Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/11/1911_02655">		<title>[1911.02655&#93; Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks</title>		<link>http://www.semanlink.net/doc/2021/11/1911_02655</link>		<description>domain adaptation for enabling QA systems to answer questions posed against
documents in new specialized domains

&gt; In experiments on question answering in the **automobile manual domain** we demonstrate that **standard DNN transfer learning techniques work surprisingly well** in adapting DNN models to a new domain **using limited amounts of annotated training data** in the new domain.

&gt; **unsupervised
domain adaption techniques to a base model could
provide some improvement in the absence of in-domain labeled
training data**, but there may be **no advantage to
these methods once standard transfer learning methods are
able to use even limited amounts of annotated training data**
in a new domain.		</description>		<dc:date>2021-11-19T00:31:23Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/11/2108_13854_contrastive_domain_1">		<title>[2108.13854&#93; Contrastive Domain Adaptation for Question Answering using Limited Text Corpora</title>		<link>http://www.semanlink.net/doc/2021/11/2108_13854_contrastive_domain_1</link>		<description>&gt; a framework for answering
out-of-domain questions in QA settings
with limited text corpora

&gt; combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora. Here, we train a QA system on both source data and generated data from the target domain with a contrastive adaptation loss that is incorporated in the training objective.		</description>		<dc:date>2021-11-19T00:18:40Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/11/1706_03610_neural_domain_adap">		<title>[1706.03610&#93; Neural Domain Adaptation for Biomedical Question Answering</title>		<link>http://www.semanlink.net/doc/2021/11/1706_03610_neural_domain_adap</link>		<description>Datasets are generally too small to train a DL system for QA from scratch.

&gt; we adapt a neural QA system trained on a large open-domain dataset (SQuAD) to a biomedical dataset (BioASQ) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create.		</description>		<dc:date>2021-11-19T00:09:38Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/10/nils_reimers_sur_twitter_neu">		<title>Nils Reimers sur Twitter : &quot;Neural Search for Low Resource Scenarios...&quot;</title>		<link>http://www.semanlink.net/doc/2021/10/nils_reimers_sur_twitter_neu</link>		<description>1. Is low resource actually realistic?
    - No
    - Important research questions:
        - how to learn unsupervised
        - how to exploit structure (ex. title and body)
        - how to learn a concept from a single sentence
2. How good are our benchmarks? 
3. Domain-Adaptation for Dense Embeddings
    - first unsupervised training, then supervised
    - TDSAE &gt; ICT &gt; MLM
    - unclear how to adapt an existing model to a new model


&gt; TSDAE differs in that the decoder in MLM has access to full-length
word embeddings for every single token. The TSDAE decoder only
has access to the sentence vector produced by the encoder.		</description>		<dc:date>2021-10-27T01:48:22Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/10/neubig_lowresource_nlp_bootcamp">		<title>neubig/lowresource-nlp-bootcamp-2020: The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020</title>		<link>http://www.semanlink.net/doc/2021/10/neubig_lowresource_nlp_bootcamp</link>		<description>8 lectures (plus exercises) focused on NLP in data-scarse languages		</description>		<dc:date>2021-10-16T14:54:17Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/10/valar_nmt_vastly_lacking_resou">		<title>VaLaR NMT: Vastly Lacking Resources Neural Machine Translation (2019)</title>		<link>http://www.semanlink.net/doc/2021/10/valar_nmt_vastly_lacking_resou</link>		<description>&gt; We focus on extremely low-resource setting, where we are **limited to less than 10k parallel data and no mono-lingual corpora**... we create a characterdecoder-based seq2seq NMT model as a baseline and compare its performance on various levels of data scarcity. Then, we explore the performance benefit of transfer learning by training a model on a different language. .. Lastly, we use **language models and a noisy dictionary to augment our training data**. Utilizing both transfer learning and data augmentation, we see a 1.5 BLEU score improvement over the baseline		</description>		<dc:date>2021-10-14T15:46:04Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/10/bigscience_research_workshop_su">		<title>BigScience Research Workshop sur Twitter : &quot;Come help us improve language resource visibility over the next week...&quot;</title>		<link>http://www.semanlink.net/doc/2021/10/bigscience_research_workshop_su</link>		<dc:date>2021-10-07T12:05:24Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/10/linguistic_diversity">		<title>Linguistic Diversity</title>		<link>http://www.semanlink.net/doc/2021/10/linguistic_diversity</link>		<description>&gt; We create a consistent data model to complement the existing ACL Anthology Corpus with data from later years and of non-ACL conferences. We do this by augmenting the corpus using Semantic Scholar’s API and scraping ACL Anthology itself. This is a consolidated dataset for 11 conferences with different attributes. Stay tuned :)

[[2004.09095&#93; The State and Fate of Linguistic Diversity and Inclusion in the NLP World&#93;(doc:2021/10/2004_09095_the_state_and_fate)		</description>		<dc:date>2021-10-03T12:39:09Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/10/2004_09095_the_state_and_fate">		<title>[2004.09095&#93; The State and Fate of Linguistic Diversity and Inclusion in the NLP World</title>		<link>http://www.semanlink.net/doc/2021/10/2004_09095_the_state_and_fate</link>		<dc:date>2021-10-03T11:50:06Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/09/2109_04513_filling_the_gaps_i">		<title>[2109.04513&#93; Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach</title>		<link>http://www.semanlink.net/doc/2021/09/2109_04513_filling_the_gaps_i</link>		<description>[tweet&#93;(doc:2021/09/koren_lazar_sur_twitter_m)

&gt; Akkadian language, the lingua franca of the time. 

&gt; despite data scarcity (1M tokens) we can achieve state of the art performance on missing tokens prediction (89% hit@5) using a greedy decoding scheme and **pretraining on data from other languages and different time periods**.		</description>		<dc:date>2021-09-23T10:56:10Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/09/koren_lazar_sur_twitter_m">		<title>Koren Lazar sur Twitter : &quot;...Modern pre-trained language models are applicable even in extreme low-resource settings as the case of the ancient Akkadian language.&quot;</title>		<link>http://www.semanlink.net/doc/2021/09/koren_lazar_sur_twitter_m</link>		<description>[[2109.04513&#93; Filling the Gaps in Ancient Akkadian Texts: A Masked Language Modelling Approach&#93;(doc:2021/09/2109_04513_filling_the_gaps_i)		</description>		<dc:date>2021-09-23T10:42:17Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/08/the_4_biggest_open_problems_in_">		<title>The 4 Biggest Open Problems in NLP (2019)</title>		<link>http://www.semanlink.net/doc/2021/08/the_4_biggest_open_problems_in_</link>		<dc:date>2021-08-26T15:23:03Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/08/2010_02353_participatory_rese">		<title>[2010.02353&#93; Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages</title>		<link>http://www.semanlink.net/doc/2021/08/2010_02353_participatory_rese</link>		<description>about machine translation using parallel corpora only		</description>		<dc:date>2021-08-25T17:01:12Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/07/cc_100_monolingual_datasets_fr">		<title>CC-100: Monolingual Datasets from Web Crawl Data</title>		<link>http://www.semanlink.net/doc/2021/07/cc_100_monolingual_datasets_fr</link>		<description>Attempt to recreate the dataset used for training XLM-R ([[1911.02116&#93; Unsupervised Cross-lingual Representation Learning at Scale&#93;(doc:2021/07/1911_02116_unsupervised_cross))		</description>		<dc:date>2021-07-29T00:20:28Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/07/davlan_david_adelani_hugging">		<title>Davlan (David Adelani) @Huggingface</title>		<link>http://www.semanlink.net/doc/2021/07/davlan_david_adelani_hugging</link>		<description>includes a [roberta-base-finetuned-hausa&#93;(https://huggingface.co/Davlan/xlm-roberta-base-finetuned-hausa) (using data from [CC-100: Monolingual Datasets from Web Crawl Data&#93;(doc:2021/07/cc_100_monolingual_datasets_fr))		</description>		<dc:date>2021-07-29T00:01:52Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/07/2107_00676_a_primer_on_pretra">		<title>[2107.00676&#93; A Primer on Pretrained Multilingual Language Models</title>		<link>http://www.semanlink.net/doc/2021/07/2107_00676_a_primer_on_pretra</link>		<description>&gt; MLLMs are useful for bilingual tasks, particularly
in low resource scenarios.
&gt;
&gt; The surprisingly good performance of
MLLMs in crosslingual transfer as well as
bilingual tasks motivates the hypothesis that
MLLMs are learning universal patterns. However,
our survey of the studies in this space indicates that
there is no consensus yet.		</description>		<dc:date>2021-07-13T13:33:29Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/07/2010_12309_a_survey_on_recent">		<title>[2010.12309&#93; A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios</title>		<link>http://www.semanlink.net/doc/2021/07/2010_12309_a_survey_on_recent</link>		<description>Low-resource scenarios: low-resource languages, but also non standard domain and tasks.

one key goal of this survey is to highlight the underlying assumptions

[Blog post&#93;(https://towardsdatascience.com/a-visual-guide-to-low-resource-nlp-d7b4c7b1a4bc)		</description>		<dc:date>2021-07-06T13:08:01Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/07/2006_07264_low_resource_langu">		<title>[2006.07264&#93; Low-resource Languages: A Review of Past Work and Future Challenges</title>		<link>http://www.semanlink.net/doc/2021/07/2006_07264_low_resource_langu</link>		<description>bof		</description>		<dc:date>2021-07-06T13:07:39Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2021/07/practical_natural_language_proc">		<title>Practical Natural Language Processing for Low-Resource Languages</title>		<link>http://www.semanlink.net/doc/2021/07/practical_natural_language_proc</link>		<dc:date>2021-07-06T12:51:20Z</dc:date>	</item>	<item rdf:about="http://www.semanlink.net/doc/2020/08/why_you_should_do_nlp_beyond_en">		<title>Why You Should Do NLP Beyond English</title>		<link>http://www.semanlink.net/doc/2020/08/why_you_should_do_nlp_beyond_en</link>		<description>&gt; Only a few hundred languages
are represented on the web and speakers of minority languages are severely
limited in the information available to them.		</description>		<dc:date>2020-08-01T18:50:35Z</dc:date>	</item></rdf:RDF>