]> 2021-10-10 2021-10-10T10:55:57Z Les Bouts de bois de Dieu raphaelsty/RetrieverReader: Fast API QA 2021-10-04T16:35:24Z 2021-10-04 Detecting Duplicate Questions (2019) 2021-10-14 2021-10-14T11:47:03Z 2021-10-03T12:39:09Z Pratik Joshi Language technologies contribute to promoting multilingualism and linguistic diversity around the world. However, only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and applications. In this paper we look at the relation between the types of languages, resources, and their representation in NLP conferences to understand the trajectory that different languages have followed over time. Our quantitative investigation underlines the disparity between languages, especially in terms of their resources, and calls into question the "language agnostic" status of current models and systems. Through this paper, we attempt to convince the ACL community to prioritise the resolution of the predicaments highlighted here, so that no language is left behind. Kalika Bali 2004.09095 Amar Budhiraja 2021-01-27T03:39:20Z Pratik Joshi 2021-10-03T11:50:06Z Sebastin Santy 2021-10-03 2020-04-20T07:19:22Z The State and Fate of Linguistic Diversity and Inclusion in the NLP World [2004.09095] The State and Fate of Linguistic Diversity and Inclusion in the NLP World Monojit Choudhury > We create a consistent data model to complement the existing ACL Anthology Corpus with data from later years and of non-ACL conferences. We do this by augmenting the corpus using Semantic Scholar’s API and scraping ACL Anthology itself. This is a consolidated dataset for 11 conferences with different attributes. Stay tuned :) [[2004.09095] The State and Fate of Linguistic Diversity and Inclusion in the NLP World](doc:2021/10/2004_09095_the_state_and_fate) Linguistic Diversity 2021-10-03 Patrick Artus : « L’économie de spéculation est inefficace » 2021-10-03 2021-10-03T11:31:40Z La faiblesse des rendements des actifs traditionnels pousse les investisseurs à choisir des actifs spéculatifs, au détriment de l’économie productive Refers to: - [[2002.10640] Differentiable Reasoning over a Virtual Knowledge Base](doc:2020/07/2002_10640_differentiable_rea) - [[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision](doc:2020/07/2004_07202_entities_as_expert) - [[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a) 2021-10-13T15:55:04Z [2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention 2021-10-13 2021-10-13T12:49:44Z Google AI Blog: Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer (2020) 2021-10-13 > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks Zexuan Zhong sur Twitter : ...Does this really mean dense models are better? No Our #EMNLP2021 paper shows dense retrievers even fail to answer simple entity-centric questions 2021-10-07 2021-10-07T02:03:23Z Manan Dey Ryan Teehan Harshit Pandey Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Urmish Thakker [Tweet](https://twitter.com/BigscienceW/status/1450084548872744961?s=20) Albert Webson Jason Alan Fries Antoine Chaffin Shanya Sharma Sharma Abheesht Sharma Zheng Xin Yong Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks. It has been hypothesized that this is a consequence of implicit multitask learning in language model training. Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping general natural language tasks into a human-readable prompted form. We convert a large set of supervised datasets, each with multiple prompts using varying natural language. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. We fine-tune a pretrained encoder-decoder model on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-Bench benchmark, outperforming models 6x its size. All prompts and trained models are available at github.com/bigscience-workshop/promptsource/. 2021-10-15T17:08:57Z Zaid Alyafeai Jos Rozen Han Wang [2110.08207] Multitask Prompted Training Enables Zero-Shot Task Generalization Sheng Shen Lintang Sutawika Leo Gao Stella Biderman Debajyoti Datta Stephen H. Bach Matteo Manica 2021-10-18T23:12:20Z Canwen Xu Tali Bers 2110.08207 Colin Raffel Thibault Fevry Arnaud Stiegler Victor Sanh Thomas Wang Teven Le Scao Andrea Santilli Mike Tian-Jian Jiang Nihal Nayak M Saiful Bari Arun Raja Gunjan Chhablani 2021-10-15T17:08:57Z Thomas Wolf 2021-10-18 Rachel Bawden Jonathan Chang Trishala Neeraj Alexander M. Rush Eliza Szczechla Taewoon Kim [2109.04711] Pre-train or Annotate? Domain Adaptation with a Constrained Budget 2021-10-14T16:01:19Z 2021-10-14 2021-10-20 2021-10-20T00:26:36Z L’intelligence artificielle, génie de la biologie moléculaire 2021-10-07 BigScience Research Workshop sur Twitter : "Come help us improve language resource visibility over the next week..." 2021-10-07T12:05:24Z Project NERA: State Attorneys General claim Google is planning to turn the internet into a "walled garden" - MSPoweruser 2021-10-25T12:00:56Z 2021-10-25 2021-10-20T23:12:28Z Modeling AI on the Language of Brain Circuits and Architecture | Wu Tsai Neurosciences Institute > Using my language metaphor, I would say that AI researchers tend to use letters and jump directly to articles without writing the words and sentences in between. > In the brain, a variety of architectures coexist and work together to generate general intelligence, whereas most AI systems rely on a single type of circuit architecture. 2021-10-20 2021-10-22 2021-10-22T23:43:49Z Au royaume des champignons 2021-10-16T13:59:40Z Zero-shot Learners are models capable of predicting unseen classes. In this work, we propose a Zero-shot Learning approach for text categorization. Our method involves training model on a large corpus of sentences to learn the relationship between a sentence and embedding of sentence's tags. Learning such relationship makes the model generalize to unseen sentences, tags, and even new datasets provided they can be put into same embedding space. The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes. We propose three different neural networks for the task and report their accuracy on the test set of the dataset used for training them as well as two other standard datasets for which no retraining was done. We show that our models generalize well across new unseen classes in both cases. Although the models do not achieve the accuracy level of the state of the art supervised models, yet it evidently is a step forward towards general intelligence in natural language processing. 1712.05972 2021-10-16 Train Once, Test Anywhere: Zero-Shot Learning for Text Classification Pushpankar Kumar Pushp > The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes input: concatenation of the embedding of text and embedding of tag ; output : related / not related (binary classifier) > We can say that this technique learns the concept of relatedness between a sentence and a word that can be extended beyond datasets. That said, the levels of accuracy leave a lot of scope for future work. Pushpankar Kumar Pushp [1712.05972] Train Once, Test Anywhere: Zero-Shot Learning for Text Classification Muktabh Mayank Srivastava 2017-12-16T15:17:07Z 2017-12-23T20:05:03Z L’origine des chevaux domestiques enfin établie | CNRS > Le cheval moderne a été domestiqué environ 2 200 ans avant notre ère, dans le nord du Caucase. > ce cheval s’est répandu à travers l’Asie en même temps que les chariots, la roue à rayons et les langues indo-iraniennes. En revanche, les migrations vers l’Europe de populations indo-européennes des steppes, au cours du troisième millénaire avant notre ère, n’ont pas pu s’appuyer sur l’usage du cheval sa domestication et sa diffusion étant postérieures. 2021-10-20T23:05:15Z 2021-10-20 [2106.13474] Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains > adapting the off-the- shelf general pretrained models and performing task-agnostic knowledge distillation in target domains > Our findings suggest that domain-specific vocabulary and general-domain language model play vital roles in domain adaptation of a pretrained model Li Dong Yunzhi Yao Yunzhi Yao 2021-10-21T18:24:46Z Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains 2021-06-25T07:37:05Z Furu Wei Wenhui Wang Shaohan Huang 2021-06-29T05:42:13Z Large pre-trained models have achieved great success in many natural language processing tasks. However, when they are applied in specific domains, these models suffer from domain shift and bring challenges in fine-tuning and online serving for latency and capacity constraints. In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains. This is achieved by adapting the off-the-shelf general pre-trained models and performing task-agnostic knowledge distillation in target domains. Specifically, we propose domain-specific vocabulary expansion in the adaptation stage and employ corpus level occurrence probability to choose the size of incremental vocabulary automatically. Then we systematically explore different strategies to compress the large pre-trained models for specific domains. We conduct our experiments in the biomedical and computer science domain. The experimental results demonstrate that our approach achieves better performance over the BERT BASE model in domain-specific tasks while 3.3x smaller and 5.1x faster than BERT BASE. The code and pre-trained models are available at https://aka.ms/adalm. 2021-10-21 2106.13474 2021-10-20 2021-10-20T08:32:58Z Peter Bloem sur Twitter : "Clever idea. When you use augmentation, why throw away the information of which instances are augmentations of each other?" / Twitter 2021-10-17 2021-10-17T10:17:19Z Incremental Clustering - an overview | ScienceDirect Topics 2021-10-20 2021-10-20T00:31:53Z AlphaFold 2 is here: what’s behind the structure prediction miracle | Oxford Protein Informatics Group > to recap: AlphaFold 2 finds similar sequences to the input, extracts the information using an especial neural network architecture (a transformer), and then passes that information to another neural network that produces a structure. FastAPI 2021-10-04 2021-10-04T16:36:44Z 2021-10-17T10:49:59Z 2021-10-17 Application of Self-Organizing Maps in Text Clustering: A Review | IntechOpen (2012) How to extract Highlighted Parts from PDF files - Stack Overflow 2021-10-21T14:23:17Z 2021-10-21 Les nouvelles frontières du vivant | CNRS Le journal 2021-10-18T14:31:02Z 2021-10-18 2021-10-14 2021-10-14T16:41:22Z MasakhaNER: Named Entity Recognition for African Languages | MIT Press 2021-10-25T11:56:25Z fasterthanlime 🌌 sur Twitter : google has a secret deal with facebook called "Jedi Blue" 2021-10-25 2021-10-16T09:29:03Z Seth Stafford sur Twitter : "Here’s a nice paper (ICLR spotlight) on how to apply masking in LM training..." 2021-10-16 > You can read this paper two ways: > 1. As a practical speed-up technique for training large LMs. > 2. Theoretical validation that Transformers are powerful because they ‘learn PMI’. 2021-04-24T20:18:53Z Antonio Mallia Mentionned in [Building Scalable, Explainable, and Adaptive NLP Models with Retrieval | SAIL Blog](doc:2021/10/building_scalable_explainable_) 2104.12016 Nicola Tonellotto Torsten Suel Antonio Mallia 2021-04-24T20:18:53Z Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as BERT. In this paper, we propose DeepImpact, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index. Compared to existing methods, DeepImpact improves impact-score modeling and tackles the vocabulary-mismatch problem. In particular, DeepImpact leverages DocT5Query to enrich the document collection and, using a contextualized language model, directly estimates the semantic importance of tokens in a document, producing a single-value representation for each token in each document. Our experiments show that DeepImpact significantly outperforms prior first-stage retrieval approaches by up to 17% on effectiveness metrics w.r.t. DocT5Query, and, when deployed in a re-ranking scenario, can reach the same effectiveness of state-of-the-art approaches with up to 5.1x speedup in efficiency. > The black-box nature of large language models like T5 and GPT-3 makes them inefficient to train and deploy, opaque in their knowledge representations and in backing their claims with provenance, and static in facing a constantly evolving world and diverse downstream contexts. **This post explores retrieval-based NLP, where models retrieve information pertinent to solving their tasks from a plugged-in text corpus**. > > Retrieval-based NLP methods view tasks as “open-book” exams: knowledge is encoded explicitly in the form of a text corpus like Wikipedia, the medical literature, or a software’s API documentation. When solving a language task, **the model learns to search for pertinent passages** and to then use the retrieved information for crafting knowledgeable responses. In doing so, **retrieval helps decouple the capacity that language models have for understanding text from how they store knowledge** Building Scalable, Explainable, and Adaptive NLP Models with Retrieval | SAIL Blog 2021-10-07 2021-10-07T02:08:49Z 2021-10-08 Learning Passage Impacts for Inverted Indexes [2104.12016] Learning Passage Impacts for Inverted Indexes 2021-10-08T14:05:42Z Omar Khattab Sahajtomar/french_semantic · Hugging Face 2021-10-14 2021-10-14T16:08:39Z > We focus on extremely low-resource setting, where we are **limited to less than 10k parallel data and no mono-lingual corpora**... we create a characterdecoder-based seq2seq NMT model as a baseline and compare its performance on various levels of data scarcity. Then, we explore the performance benefit of transfer learning by training a model on a different language. .. Lastly, we use **language models and a noisy dictionary to augment our training data**. Utilizing both transfer learning and data augmentation, we see a 1.5 BLEU score improvement over the baseline VaLaR NMT: Vastly Lacking Resources Neural Machine Translation (2019) 2021-10-14 2021-10-14T15:46:04Z 2021-10-15T17:18:31Z 2021-10-15 Sphinx « Pandora Papers » : plongée mondiale dans les secrets de la finance offshore 2021-10-03 2021-10-03T22:39:05Z Yu Meng 2021-10-16T13:48:25Z 2010.07245 Yunyi Zhang Text Classification Using Label Names Only: A Language Model Self-Training Approach Jiaxin Huang 2020-10-14T17:06:41Z [2010.07245] Text Classification Using Label Names Only: A Language Model Self-Training Approach Jiawei Han 2021-10-16 Chenyan Xiong Yu Meng Chao Zhang Heng Ji > In this paper, we explore the potential of only **using the label name of each class** to train classification models on unlabeled data, **without using any labeled documents**. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method > 1. associates semantically related words with the label names, > 2. finds category-indicative words and trains the model to predict their implied categories, and > 3. generalizes the model via self-training. 2020-10-14T17:06:41Z Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% accuracy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name. 2021-10-02T09:56:03Z 2021-10-02 Enterprise Knowledge Graph Foundation Sentence Embeddings and Transformers | Pinecone 2021-10-23T01:04:37Z 2021-10-23 Your Money and Your Life - by Edward Snowden - Continuing Ed — with Edward Snowden 2021-10-09 2021-10-09T15:03:07Z > Central Banks Digital Currencies will ransom our future 2021-10-24 2021-10-24T11:28:51Z « Le massacre de la Saint-Barthélemy s’est joué entre voisins » 2021-10-13T12:53:20Z 2021-10-13 Omer Levy sur Twitter : "What if I told you that fine-tuning T5-Large (0.8B params) on a couple hundred examples could outperform GPT-3 (175B params) on a bunch of tasks?" 2021-10-08T23:22:34Z 2021-10-08 Open Range (2003 film) De et avec Kevin Kostner, et une bonne gunfight > Selective classification, where models can abstain when they are unsure about a prediction, routinely improves average accuracy. Worryingly, we show that s.c. can also hurt accuracy on certain subgroups of the data. [twitter](https://twitter.com/ErikJones313/status/1448681482176790532) Selective Classification Can Magnify Disparities Across Groups | SAIL Blog 2021-10-16T09:13:10Z 2021-10-16 2021-10-11T22:37:54Z 2021-10-11 Kelechi sur Twitter : "Excited to present AfriBERTa, a multilingual LM pretrained from scratch on 11 African languages with a joint corpus of less than 1GB." 2021-10-16 2021-10-16T13:39:33Z James Briggs sur Twitter : *free* course on vector similarity search and Faiss..." Année de la biologie 2021-2022 2021-10-18 2021-10-18T14:30:30Z 2021-10-26 2021-10-26T16:02:15Z Isotropy in the Contextual Embedding Space: Clusters and Manifolds | OpenReview neubig/lowresource-nlp-bootcamp-2020: The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020 2021-10-16 2021-10-16T14:54:17Z 8 lectures (plus exercises) focused on NLP in data-scarse languages 2019-11-19T10:17:52Z Alexander Rietzler 2021-10-21 Aspect-Target Sentiment Classification (ATSC) is a subtask of Aspect-Based Sentiment Analysis (ABSA), which has many applications e.g. in e-commerce, where data and insights from reviews can be leveraged to create value for businesses and customers. Recently, deep transfer-learning methods have been applied successfully to a myriad of Natural Language Processing (NLP) tasks, including ATSC. Building on top of the prominent BERT language model, we approach ATSC using a two-step procedure: self-supervised domain-specific BERT language model finetuning, followed by supervised task-specific finetuning. Our findings on how to best exploit domain-specific language model finetuning enable us to produce new state-of-the-art performance on the SemEval 2014 Task 4 restaurants dataset. In addition, to explore the real-world robustness of our models, we perform cross-domain evaluation. We show that a cross-domain adapted BERT language model performs significantly better than strong baseline models like vanilla BERT-base and XLNet-base. Finally, we conduct a case study to interpret model prediction errors. Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification [1908.11860] Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification Stefan Engl 2019-08-30T17:44:30Z 2021-10-21T12:56:49Z Alexander Rietzler 1908.11860 Paul Opitz Sebastian Stabinger 2021-10-27 Next-Gen Sentence Embeddings with Multiple Negatives Ranking Loss | Pinecone > the world of sentence embeddings was ignited with the introduction of SBERT in 2019. Since then, many more sentence transformers have been introduced. These models quickly made the original SBERT obsolete. How did these newer sentence transformers manage to outperform SBERT so quickly? The answer is multiple negatives ranking (MNR) loss. 2021-10-27T01:24:49Z