@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sl:    <http://www.semanlink.net/2001/00/semanlink-schema#> .
@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tag:   <http://www.semanlink.net/tag/> .
@prefix foaf:  <http://xmlns.com/foaf/0.1/> .
@prefix dc:    <http://purl.org/dc/elements/1.1/> .

<http://www.semanlink.net/doc/2022/05/2205_03983_building_machine_t>
        dc:title         "[2205.03983] Building Machine Translation Systems for the Next Thousand Languages" ;
        sl:creationDate  "2022-05-10" ;
        sl:tag           tag:machine_translation , tag:nlp_low_resource_scenarios , tag:low_resource_languages , tag:arxiv_doc .

<http://www.semanlink.net/doc/2023/01/colin_leong_sur_twitter_this>
        dc:title         "Colin Leong sur Twitter : \"This book is about the only \"dataset\" I ever found for Hani. My first ever foray into the field, I found an electronic copy and munged it into a Hani/English parallel corpus, and trained a JoeyNMT model with the help of @MasakhaneNLP and @KreutzerJulia in particular.\" / Twitter" ;
        sl:comment       "[joeynmt/joeynmt: Minimalist NMT for educational purposes](doc:2023/01/joeynmt_joeynmt_minimalist_nmt)" ;
        sl:creationDate  "2023-01-05" ;
        sl:tag           tag:tweet , tag:neural_machine_translation , tag:nlp_low_resource_scenarios .

tag:named_entity_recognition
        a               sl:Tag ;
        skos:prefLabel  "Named Entity Recognition" .

tag:synthetic_data_nlp
        a               sl:Tag ;
        skos:prefLabel  "Synthetic data (NLP)" .

<http://www.semanlink.net/doc/2021/11/1911_02655>
        dc:title         "[1911.02655] Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks" ;
        sl:comment       "domain adaptation for enabling QA systems to answer questions posed against\r\ndocuments in new specialized domains\r\n\r\n> In experiments on question answering in the **automobile manual domain** we demonstrate that **standard DNN transfer learning techniques work surprisingly well** in adapting DNN models to a new domain **using limited amounts of annotated training data** in the new domain.\r\n\r\n> **unsupervised\r\ndomain adaption techniques to a base model could\r\nprovide some improvement in the absence of in-domain labeled\r\ntraining data**, but there may be **no advantage to\r\nthese methods once standard transfer learning methods are\r\nable to use even limited amounts of annotated training data**\r\nin a new domain." ;
        sl:creationDate  "2021-11-19" ;
        sl:tag           tag:nlp_automotive , tag:nlp_microsoft , tag:nlp_low_resource_scenarios , tag:extractive_question_answering , tag:domain_adaptation_nlp , tag:arxiv_doc .

tag:neural_ranking_models
        a               sl:Tag ;
        skos:prefLabel  "Neural Ranking Models" .

tag:domain_adaptation_nlp
        a               sl:Tag ;
        skos:prefLabel  "Domain adaptation (NLP)" .

<http://www.semanlink.net/doc/2021/12/sebastian_ruder_sur_twitter_>
        dc:title         "Sebastian Ruder sur Twitter : \"Modular and Parameter-Efficient Fine-Tuning for NLP Models\"" ;
        sl:creationDate  "2021-12-17" ;
        sl:tag           tag:tweet , tag:sebastian_ruder , tag:nlp_low_resource_scenarios .

<http://www.semanlink.net/doc/2022/03/domain_adaptation_of_word_embed>
        dc:title         "Domain adaptation of word embeddings through the exploitation of in-domain corpora and knowledge bases (PhD Thesis 2021)" ;
        sl:comment       "Thèse de Hicham El Boukkouri, univ Paris-Saclay\r\n\r\n[Github](https://github.com/helboukkouri/phd-code)\r\n\r\n### Goal\r\n\r\nGiven a target specialized domain, improve the quality of general-domain\r\nword representations using in-domain corpora and/or knowledge bases\r\n\r\n### Contributions\r\n\r\n#### a method for specializing general-domain embeddings in a [Low-Resource](tag:nlp_low_resource_scenarios) context.\r\n\r\n> - train static representations on the task corpus, \r\n> - resume the\r\npre-training of general-domain contextual embeddings on the same task corpus,\r\n> - finally, combine both static and contextual representations into\r\none final model\r\n\r\n#### we tackle the issue of using a general-domain vocabulary in a specialized domain\r\n\r\n#### Evaluation of  re-training vs training from scratch on specialized corpora using a specialized vocabulary\r\n\r\ntraining from scratch better, but not that much: re-training from a general model\r\nis still appropriate as it is less expensive and leads to comparable, although\r\nslightly lower, performance\r\n\r\n#### Regarding subword-based tokenization systems\r\n> we argue that they are inconvenient in practice -> CharacterBERT, a variant\r\nof BERT that uses ELMo’s character-based system instead of WordPieces. More convenient ti use, superior robustness to misspellings\r\n\r\n#### Ways to specialize general-domain representations using knowledge bases\r\n\r\na strong baseline using a simple\r\nmethod relying on graph embeddings and concatenation, using only is_a relation\r\n\r\n> both static and contextual embeddings may effectively be\r\nspecialized using this simple approach\r\n\r\n#### Knowledge Injection Modules (KIM) that\r\ninject the knowledge representations directly within the BERT-like models' architecture\r\n\r\n### Notes\r\n\r\n> our experiments focused on a single\r\nsetting (i.e. the medical domain and the English language)\r\n\r\n> meta-embeddings, an approach that consists in combining\r\ndifferent sets of representations for achieving improved performance" ;
        sl:creationDate  "2022-03-23" ;
        sl:tag           tag:phd_thesis , tag:olivier_ferret , tag:nlp_low_resource_scenarios , tag:nlp_using_knowledge_graphs , tag:domain_adaptation_nlp , tag:adapter_modules_finetuning .

<http://www.semanlink.net/doc/2022/06/acl_2022_highlights>
        dc:title         "ACL 2022 Highlights" ;
        sl:creationDate  "2022-06-07" ;
        sl:tag           tag:sebastian_ruder , tag:nlp_4_africa , tag:nlp_low_resource_scenarios , tag:acl_2022 .

<http://www.semanlink.net/doc/2021/07/2010_12309_a_survey_on_recent>
        dc:title         "[2010.12309] A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios" ;
        sl:comment       "Low-resource scenarios: low-resource languages, but also non standard domain and tasks.\r\n\r\none key goal of this survey is to highlight the underlying assumptions\r\n\r\n[Blog post](https://towardsdatascience.com/a-visual-guide-to-low-resource-nlp-d7b4c7b1a4bc)" ;
        sl:creationDate  "2021-07-06" ;
        sl:tag           tag:survey , tag:nlp_low_resource_scenarios , tag:low_resource_languages , tag:bosch , tag:arxiv_doc .

<http://www.semanlink.net/doc/2021/10/neubig_lowresource_nlp_bootcamp>
        dc:title         "neubig/lowresource-nlp-bootcamp-2020: The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020" ;
        sl:comment       "8 lectures (plus exercises) focused on NLP in data-scarse languages" ;
        sl:creationDate  "2021-10-16" ;
        sl:tag           tag:nlp_low_resource_scenarios , tag:low_resource_languages .

tag:bosch  a            sl:Tag ;
        skos:prefLabel  "Bosch" .

tag:automatically_annotated_data
        a               sl:Tag ;
        skos:prefLabel  "Automatically Annotated Data" .

tag:nlp_using_knowledge_graphs
        a               sl:Tag ;
        skos:prefLabel  "Knowledge Graphs in NLP" .

tag:low_resource_learning
        a               sl:Tag ;
        skos:prefLabel  "Low-resource Learning" .

tag:provoc  a           sl:Tag ;
        skos:prefLabel  "Provoc" .

tag:acl_2020  a         sl:Tag ;
        skos:prefLabel  "ACL 2020" .

tag:active_learning  a  sl:Tag ;
        skos:prefLabel  "Active learning" .

tag:question_answering
        a               sl:Tag ;
        skos:prefLabel  "Question Answering" .

tag:neural_machine_translation
        a               sl:Tag ;
        skos:prefLabel  "Neural machine translation" .

<http://www.semanlink.net/doc/2022/09/2209_00099_efficient_methods_>
        dc:title         "[2209.00099] Efficient Methods for Natural Language Processing: A Survey" ;
        sl:comment       "> We thus structure this survey by following the typical\r\nNLP model pipeline and present the existing\r\nmethods that aim to make the respective stage\r\nmore efficient." ;
        sl:creationDate  "2022-09-04" ;
        sl:tag           tag:survey , tag:nlp_low_resource_scenarios , tag:language_model_fine_tuning , tag:language_models_size , tag:arxiv_doc .

tag:labeling_data  a    sl:Tag ;
        skos:prefLabel  "Labeling data" .

tag:nlp_microsoft  a    sl:Tag ;
        skos:prefLabel  "NLP@Microsoft" .

tag:nlp_automotive  a   sl:Tag ;
        skos:prefLabel  "NLP+Automotive" .

tag:entity_tagging  a   sl:Tag ;
        skos:prefLabel  "Entity Tagging" .

<http://www.semanlink.net/doc/2022/03/2004_05119_beyond_fine_tuning>
        dc:title         "[2004.05119] Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer" ;
        sl:comment       "> Fine-tuning (FT) pre-trained sentence embedding models on small datasets has been shown to have limitations. In this paper we show that concatenating the embeddings from the pre-trained model with those from a simple sentence embedding model trained only on the target data, can improve over the performance of FT for few-sample tasks" ;
        sl:creationDate  "2022-03-31" ;
        sl:tag           tag:sbert_fine_tuning , tag:nlp_microsoft , tag:nlp_amazon , tag:nlp_low_resource_scenarios , tag:arxiv_doc , tag:acl_2020 .

tag:language_models_size
        a               sl:Tag ;
        skos:prefLabel  "Language Models: size" .

tag:biomedical_nlp  a   sl:Tag ;
        skos:prefLabel  "Biomedical NLP" .

tag:low_resource_languages
        a               sl:Tag ;
        skos:broader    tag:nlp_low_resource_scenarios ;
        skos:prefLabel  "Low-Resource Languages" .

tag:fine_tuning  a      sl:Tag ;
        skos:prefLabel  "Fine-tuning" .

<http://www.semanlink.net/doc/2023/09/2309_06131_annotating_data_fo>
        dc:title         "[2309.06131] Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection" ;
        sl:comment       "compare les Sentence Transformers, les cross encoders et Colbert dans le cadre low resource\r\n\r\n> \"optimal'' subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them." ;
        sl:creationDate  "2023-09-14" ;
        sl:tag           tag:fine_tuning , tag:active_learning , tag:arxiv_doc , tag:labeling_data , tag:sigir_2023 , tag:discute_avec_raphael , tag:nlp_low_resource_scenarios , tag:provoc , tag:neural_ranking_models .

tag:nlp_amazon  a       sl:Tag ;
        skos:prefLabel  "NLP@Amazon" .

tag:arxiv_doc  a        sl:Tag ;
        skos:prefLabel  "Arxiv Doc" .

tag:adapter_modules_finetuning
        a               sl:Tag ;
        skos:prefLabel  "Adapter modules (LM finetuning)" .

tag:sbert_fine_tuning
        a               sl:Tag ;
        skos:prefLabel  "sBERT fine-tuning" .

tag:olivier_ferret  a   sl:Tag ;
        skos:prefLabel  "Olivier Ferret" .

tag:phd_thesis  a       sl:Tag ;
        skos:prefLabel  "PhD Thesis" .

tag:nlp_low_resource_scenarios
        a                 sl:Tag ;
        rdfs:isDefinedBy  tag:nlp_low_resource_scenarios.n3 ;
        skos:broader      tag:nlp_problem ;
        skos:prefLabel    "Low-Resource NLP" ;
        skos:related      tag:low_resource_learning ;
        foaf:page         tag:nlp_low_resource_scenarios.html .

tag:nlp_4_africa  a     sl:Tag ;
        skos:prefLabel  "NLP 4 Africa" .

tag:pre_trained_language_models
        a               sl:Tag ;
        skos:prefLabel  "Pre-Trained Language Models" .

tag:benchmark  a        sl:Tag ;
        skos:prefLabel  "Benchmark" .

<http://www.semanlink.net/doc/2022/07/1807_00745_training_a_neural_>
        dc:title         "[1807.00745] Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data" ;
        sl:comment       "Automatically created labels can deteriorate\r\na classifier’s performance\r\n\r\n> approach to training\r\na neural network with **a combination of a small\r\namount of clean data and a larger set of automatically\r\nannotated, noisy instances**\r\n>\r\n> We model the\r\nnoise explicitly using a **noise layer** that is added\r\nto the network architecture. This allows us to directly\r\noptimize the network weights using standard\r\ntechniques. After training, the noise layer\r\nis not needed anymore, removing any added complexity.\r\n\r\n[related blog post](https://www.roxanne-euproject.org/news/blog/making-natural-language-processing-work-for-little-training-data)" ;
        sl:creationDate  "2022-07-18" ;
        sl:tag           tag:named_entity_recognition , tag:nlp_low_resource_scenarios , tag:automatically_annotated_data , tag:arxiv_doc .

tag:nils_reimers  a     sl:Tag ;
        skos:prefLabel  "Nils Reimers" .

tag:snorkel  a          sl:Tag ;
        skos:prefLabel  "Snorkel" .

tag:moshe_wasserblat  a  sl:Tag ;
        skos:prefLabel  "Moshe Wasserblat" .

tag:extractive_question_answering
        a               sl:Tag ;
        skos:prefLabel  "Extractive Question Answering" .

tag:sigir_2023  a       sl:Tag ;
        skos:prefLabel  "SIGIR 2023" .

<http://www.semanlink.net/doc/2022/07/dealing_with_data_scarcity_in_n>
        dc:title         "Dealing with Data Scarcity in Natural Language Processing | by Yves Peirsman | NLPTown | Medium 2019)" ;
        sl:comment       "> Snorkel’s process is as follows. First, a developer writes\r\nlabelling functions and evaluates them on a small set of\r\nlabelled training data. Snorkel allows us to evaluate the\r\naccuracy and coverage of all our labelling functions, and\r\ntheir overlaps and conflicts with each other. Next, it trains\r\na generative label model over these labelling functions\r\nthat learns how best to combine them. Finally, this label\r\nmodel outputs probabilistic labels that we can use to train\r\nan end model." ;
        sl:creationDate  "2022-07-18" ;
        sl:tag           tag:yves_peirsman , tag:snorkel , tag:nlp_low_resource_scenarios .

tag:language_model_fine_tuning
        a               sl:Tag ;
        skos:prefLabel  "LM Fine-tuning" .

tag:llm  a              sl:Tag ;
        skos:prefLabel  "LLM" .

tag:yves_peirsman  a    sl:Tag ;
        skos:prefLabel  "Yves Peirsman" .

<http://www.semanlink.net/doc/2021/11/1706_03610_neural_domain_adap>
        dc:title         "[1706.03610] Neural Domain Adaptation for Biomedical Question Answering" ;
        sl:comment       "Datasets are generally too small to train a DL system for QA from scratch.\r\n\r\n> we adapt a neural QA system trained on a large open-domain dataset (SQuAD) to a biomedical dataset (BioASQ) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create." ;
        sl:creationDate  "2021-11-19" ;
        sl:tag           tag:question_answering , tag:nlp_low_resource_scenarios , tag:domain_adaptation_nlp , tag:biomedical_nlp , tag:arxiv_doc .

tag:machine_translation
        a               sl:Tag ;
        skos:prefLabel  "Machine translation" .

tag:acl_2022  a         sl:Tag ;
        skos:prefLabel  "ACL 2022" .

tag:sebastian_ruder  a  sl:Tag ;
        skos:prefLabel  "Sebastian Ruder" .

tag:survey  a           sl:Tag ;
        skos:prefLabel  "Survey / Review" .

tag:nlp_problem  a      sl:Tag ;
        skos:prefLabel  "NLP tasks / problems" .

<http://www.semanlink.net/doc/2021/10/nils_reimers_sur_twitter_neu>
        dc:title         "Nils Reimers sur Twitter : \"Neural Search for Low Resource Scenarios...\"" ;
        sl:comment       "1. Is low resource actually realistic?\r\n    - No\r\n    - Important research questions:\r\n        - how to learn unsupervised\r\n        - how to exploit structure (ex. title and body)\r\n        - how to learn a concept from a single sentence\r\n2. How good are our benchmarks? \r\n3. Domain-Adaptation for Dense Embeddings\r\n    - first unsupervised training, then supervised\r\n    - TDSAE > ICT > MLM\r\n    - unclear how to adapt an existing model to a new model\r\n\r\n\r\n> TSDAE differs in that the decoder in MLM has access to full-length\r\nword embeddings for every single token. The TSDAE decoder only\r\nhas access to the sentence vector produced by the encoder." ;
        sl:creationDate  "2021-10-27" ;
        sl:tag           tag:tweet , tag:nils_reimers , tag:neural_models_for_information_retrieval , tag:nlp_low_resource_scenarios , tag:domain_adaptation_nlp , tag:benchmark .

<http://www.semanlink.net/doc/2023/08/do_large_language_models_work_o>
        dc:title         "Do large language models work on Tagalog?" ;
        sl:comment       "how LLMs work on Tagalog data in structured prediction tasks?\r\n> tl;dr: you might get more bang for your buck training a supervised model!" ;
        sl:creationDate  "2023-08-07" ;
        sl:tag           tag:nlp_low_resource_scenarios , tag:low_resource_languages , tag:llm .

<http://www.semanlink.net/doc/2022/03/1910_06294_training_compact_m>
        dc:title         "[1910.06294] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models" ;
        sl:creationDate  "2022-03-31" ;
        sl:tag           tag:pre_trained_language_models , tag:moshe_wasserblat , tag:nlp_low_resource_scenarios , tag:entity_tagging , tag:arxiv_doc .

tag:discute_avec_raphael
        a               sl:Tag ;
        skos:prefLabel  "Discuté avec Raphaël" .

tag:tweet  a            sl:Tag ;
        skos:prefLabel  "Tweet" .

tag:domain_specific_corpora
        a               sl:Tag ;
        skos:broader    tag:nlp_low_resource_scenarios ;
        skos:prefLabel  "Domain specific corpora" .

<http://www.semanlink.net/doc/2021/11/2108_13854_contrastive_domain_1>
        dc:title         "[2108.13854] Contrastive Domain Adaptation for Question Answering using Limited Text Corpora" ;
        sl:comment       "> a framework for answering\r\nout-of-domain questions in QA settings\r\nwith limited text corpora\r\n\r\n> combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora. Here, we train a QA system on both source data and generated data from the target domain with a contrastive adaptation loss that is incorporated in the training objective." ;
        sl:creationDate  "2021-11-19" ;
        sl:tag           tag:synthetic_data_nlp , tag:question_answering , tag:nlp_low_resource_scenarios , tag:domain_adaptation_nlp , tag:arxiv_doc .

tag:neural_models_for_information_retrieval
        a               sl:Tag ;
        skos:prefLabel  "Neural Search" .
