]> 2022-09-15 2022-09-15T08:13:45Z Intro to Transformers - Google Slides Nils Reimers - YouTube 2022-09-02 2022-09-02T14:33:27Z [2008.09093] PARADE: Passage Representation Aggregation for Document Reranking 2021-06-10T17:46:31Z Andrew Yates Sean MacAvaney Canjia Li Yingfei Sun Pretrained transformer models, such as BERT and T5, have shown to be highly effective at ad-hoc passage and document ranking. Due to inherent sequence length limits of these models, they need to be run over a document's passages, rather than processing the entire document sequence at once. Although several approaches for aggregating passage-level signals have been proposed, there has yet to be an extensive comparison of these techniques. In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score. We find that passage representation aggregation techniques can significantly improve over techniques proposed in prior work, such as taking the maximum passage score. We call this new approach PARADE. In particular, PARADE can significantly improve results on collections with broad information needs where relevance signals can be spread throughout the document (such as TREC Robust04 and GOV2). Meanwhile, less complex aggregation techniques may work better on collections with an information need that can often be pinpointed to a single passage (such as TREC DL and TREC Genomics). We also conduct efficiency analyses, and highlight several strategies for improving transformer-based aggregation. recommandé par [Nils Reimers](tag:nils_reimers) 2022-09-21T23:10:09Z 2020-08-20T17:32:30Z PARADE: Passage Representation Aggregation for Document Reranking 2022-09-21 Ben He 2008.09093 Canjia Li Shaohan Huang The poor performance of the original BERT for sentence semantic similarity has been widely discussed in previous works. We find that unsatisfactory performance is mainly due to the static token embeddings biases and the ineffective BERT layers, rather than the high cosine similarity of the sentence embeddings. To this end, we propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective. By reformulating the sentence embeddings task as the fillin-the-blanks problem, our method significantly improves the performance of original BERT. We discuss two prompt representing methods and three prompt searching methods for prompt based sentence embeddings. Moreover, we propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting. For experiments, we evaluate our method on both non fine-tuned and fine-tuned settings. Even a non fine-tuned method can outperform the fine-tuned methods like unsupervised ConSERT on STS tasks. Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings. Compared to SimCSE, we achieve 2.29 and 2.58 points improvements on BERT and RoBERTa respectively under the unsupervised setting. PromptBERT improving BERT sentence embeddings with prompts - Ethan Kim 2022-09-16 2022-09-16T10:31:11Z Ting Jiang 2022-09-16T10:06:59Z Haizhen Huang [2201.04337] PromptBERT: Improving BERT Sentence Embeddings with Prompts Qi Zhang Fuzhen Zhuang 2022-09-16 [PromptBERT improving BERT sentence embeddings with prompts - Ethan Kim](doc:2022/09/promptbert_improving_bert_sente) Furu Wei Liangjie Zhang Deqing Wang 2022-01-12T06:54:21Z Zihan Zhang Ting Jiang 2022-01-12T06:54:21Z PromptBERT: Improving BERT Sentence Embeddings with Prompts 2201.04337 Perspectives on knowledge acquisition & mobilization with neural net - Hugo Larochelle - CoLLAs 2022 - YouTube > my thoughts on the state of progress in designing AI systems with neural networks. I’ll frame a perspective that views our success as relying on two separate and equally critical steps, that I refer to as **neural knowledge acquisition** and neural knowledge **mobilization** 2022-09-17T10:10:02Z 2022-09-17 [2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners Jungo Kasai Hongjin Su Mari Ostendorf Chen Henry Wu Tianlu Wang Jiayi Xin Hongjin Su 2022-09-07T13:20:58Z Noah A. Smith Tao Yu > This work examines the implications of in-context learning for the creation of datasets for new natural language tasks. > > Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time. an unsupervised, graph-based selective annotation method, vote-k, to select diverse, representative examples to annotate Selective Annotation Makes Language Models Better Few-Shot Learners 2022-09-05T14:01:15Z Rui Zhang 2209.01975 Weijia Shi Luke Zettlemoyer Many recent approaches to natural language tasks are built on the remarkable abilities of large language models. Large language models can perform in-context learning, where they learn a new task from a few task demonstrations, without any parameter updates. This work examines the implications of in-context learning for the creation of datasets for new natural language tasks. Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time. Based on this framework, we propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate. Extensive experiments on 10 datasets (covering classification, commonsense reasoning, dialogue, and text/code generation) demonstrate that our selective annotation method improves the task performance by a large margin. On average, vote-k achieves a 12.9%/11.4% relative gain under an annotation budget of 18/100, as compared to randomly selecting examples to annotate. Compared to state-of-the-art supervised finetuning approaches, it yields similar performance with 10-100x less annotation cost across 10 tasks. We further analyze the effectiveness of our framework in various scenarios: language models with varying sizes, alternative selective annotation methods, and cases where there is a test data domain shift. We hope that our studies will serve as a basis for data annotations as large language models are increasingly applied to new tasks. Our code is available at https://github.com/HKUNLP/icl-selective-annotation. 2022-09-07 2022-09-05T14:01:15Z « Le Voyage de l’humanité, aux origines de la richesse et des inégalités » : une histoire de l’abondance contemporaine 2022-09-24T11:43:01Z 2022-09-24 Dense retrievers encode documents into fixed dimensional embeddings. However, storing all the document embeddings within an index produces bulky indexes which are expensive to serve. Recently, BPR (Yamada et al., 2021) and JPQ (Zhan et al., 2021a) have been proposed which train the model to produce binary document vectors, which reduce the index 32x and more. The authors showed these binary embedding models significantly outperform more traditional index compression techniques like Product Quantization (PQ). Previous work evaluated these approaches just in-domain, i.e. the methods were evaluated on tasks for which training data is available. In practice, retrieval models are often used in an out-of-domain setting, where they have been trained on a publicly available dataset, like MS MARCO, but are then used for some custom dataset for which no training data is available. In this work, we show that binary embedding models like BPR and JPQ can perform significantly worse than baselines once there is a domain-shift involved. We propose a modification to the training procedure of BPR and JPQ and combine it with a corpus specific generative procedure which allow the adaptation of BPR and JPQ to any corpus without requiring labeled training data. Our domain-adapted strategy known as GPL is model agnostic, achieves an improvement by up-to 19.3 and 11.6 points in nDCG@10 across the BEIR benchmark in comparison to BPR and JPQ while maintaining its 32x memory efficiency. JPQ+GPL even outperforms our upper baseline: uncompressed TAS-B model on average by 2.0 points. 2205.11498 2022-09-26 2022-05-23T17:53:44Z Domain Adaptation for Memory-Efficient Dense Retrieval Nils Reimers Nandan Thakur [2205.11498] Domain Adaptation for Memory-Efficient Dense Retrieval Refers to [Binary Passage Retriever (BPR)](doc:2021/06/2106_00882_efficient_passage_) 2022-09-26T17:46:39Z Nandan Thakur Jimmy Lin 2022-05-23T17:53:44Z Brij B. Gupta 2022-09-06 Pengzhen Ren Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we have massive amounts of data. In this way, DL has aroused strong interest of researchers and has been rapidly developed. Compared with DL, researchers have relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples. Therefore, early AL is difficult to reflect the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to the publicity of the large number of existing annotation datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, which is not allowed in some fields that require high expertise, especially in the fields of speech recognition, information extraction, medical images, etc. Therefore, AL has gradually received due attention. A natural idea is whether AL can be used to reduce the cost of sample annotations, while retaining the powerful learning capabilities of DL. Therefore, deep active learning (DAL) has emerged. Although the related research has been quite abundant, it lacks a comprehensive survey of DAL. This article is to fill this gap, we provide a formal classification method for the existing work, and a comprehensive and systematic overview. In addition, we also analyzed and summarized the development of DAL from the perspective of application. Finally, we discussed the confusion and problems in DAL, and gave some possible development directions for DAL. 2020-08-30T04:28:31Z A Survey of Deep Active Learning 2009.00236 2022-09-06T18:40:19Z 2021-12-05T22:20:32Z Zhihui Li Po-Yao Huang Yun Xiao Xin Wang [2009.00236] A Survey of Deep Active Learning Xiaojun Chang Xiaojiang Chen Pengzhen Ren Modeling DNA Sequences with PyTorch | by Erin Wilson | Sep, 2022 | Towards Data Science 2022-09-18T09:44:07Z 2022-09-18 2022-09-17T13:45:05Z Percy Liang Shivam Garg [2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions -- that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes -- namely sparse linear functions, two-layer neural networks, and decision trees -- with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning . 2022-09-17 Shivam Garg 2022-08-01T18:01:40Z Dimitris Tsipras What Can Transformers Learn In-Context? A Case Study of Simple Function Classes 2208.01066 Gregory Valiant the NN learns *how to learn* linear regression, decision trees, 2-layer ReLU nets! 2022-08-01T18:01:40Z OpenAI sur Twitter : "We've trained a neural net called Whisper that approaches human-level robustness and accuracy on English speech recognition..." 2022-09-25T13:13:47Z 2022-09-25 Sense and Sensibility 2022-09-30 2022-09-30T23:16:51Z drama film directed by Ang Lee and based on Jane Austen's 1811 novel of the same name. With Emma Thompson and hugh Grant Ethan Perez Eli Tran-Johnson 2022-07-16T15:24:01Z Amanda Askell Tom Brown Saurav Kadavath Nelson Elhage 2022-09-15T00:11:02Z Ben Mann 2022-09-15 Anna Chen Nova DasSarma Jack Clark Jared Kaplan Catherine Olsson Stanislav Fort Josh Jacobson Zac Hatfield Dodds Scott Johnston Danny Hernandez Liane Lovitt Sheer El-Showk Language Models (Mostly) Know What They Know > we show that language models can evaluate whether what they say is true, and predict ahead of time whether they'll be able to answer questions correctly. Saurav Kadavath Nicholas Joseph 2022-07-11T22:59:39Z Kamal Ndousse Sam McCandlish Andy Jones We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing. Shauna Kravec Tom Henighan Jackson Kernion 2207.05221 Dawn Drain Tom Conerly Yuntao Bai Chris Olah Sam Ringer Dario Amodei Nicholas Schiefer Deep Ganguli Tristan Hume Sam Bowman [2207.05221] Language Models (Mostly) Know What They Know 2022-09-16T09:49:38Z 2022-09-16 Prompt Tuning BERT🎯:CommonLit Readability | Kaggle > Prompt-tuning is a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks.Soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Finally, we show that conditioning,a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. [tweet](https://twitter.com/_akhaliq/status/1573109469646561280?s=20&t=RTpK9dh90az0zT1Xg2ohpQ) Unso Eun Seo Jo Moshe Wasserblat Nils Reimers Efficient Few-Shot Learning Without Prompts [2209.11055] Efficient Few-Shot Learning Without Prompts Lewis Tunstall Recent few-shot methods, such as parameter-efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billion-parameter language models to achieve high accuracy. To address these shortcomings, we propose SetFit (Sentence Transformer Fine-tuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SetFit works by first fine-tuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner. The resulting model is then used to generate rich text embeddings, which are used to train a classification head. This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques. Our experiments show that SetFit obtains comparable results with PEFT and PET techniques, while being an order of magnitude faster to train. We also show that SetFit can be applied in multilingual settings by simply switching the ST body. Our code is available at https://github.com/huggingface/setfit and our datasets at https://huggingface.co/setfit . Luke Bates Lewis Tunstall 2022-09-22T14:48:11Z Oren Pereg 2022-09-22T14:48:11Z 2022-09-23 Daniel Korat 2022-09-23T10:26:46Z 2209.11055 2022-09-29 EDF ou l’histoire d’une débâcle française 2022-09-29T21:18:03Z [[1405.5893] Computerization of African languages-French dictionaries](doc:2021/06/1405_5893_computerization_of_) 2022-09-17 2022-09-17T17:23:22Z Dictionnaires langue africaine-français mis en ligne par l'Université de Nantes - Universal Models - [Massive Multi-task learning (NLP)](tag:massive_multi_task_learning_nlp) - [Beyond the Transformer](tag:alternative_to_transformers) - [Prompting](tag:prompted_models) - Efficient Methods - Benchmarking - Conditional Image Generation - ML for Science - Program Synthesis - Bias - Retrieval Augmentation ([Retrieval augmented LM](tag:retrieval_augmented_lm)) - [Token-free Models](tag:token_free_models) - [Temporal Adaptation](tag:lm_temporal_adaptation) - The Importance of Data - Meta-learning ML and NLP Research Highlights of 2021 2022-09-07T13:33:12Z 2022-09-07 2022-09-15 2022-09-15T11:20:19Z Michael Galkin sur Twitter : "GraphGPS is now accepted at #NeurIPS2022..." 2008.07267 Andreas Niekler 2022-09-06 Christopher Schröder Christopher Schröder 2022-09-06T18:43:54Z Natural language processing (NLP) and neural networks (NNs) have both undergone significant changes in recent years. For active learning (AL) purposes, NNs are, however, less commonly used -- despite their current popularity. By using the superior text classification performance of NNs for AL, we can either increase a model's performance using the same amount of data or reduce the data and therefore the required annotation efforts while keeping the same performance. We review AL for text classification using deep neural networks (DNNs) and elaborate on two main causes which used to hinder the adoption: (a) the inability of NNs to provide reliable uncertainty estimates, on which the most commonly used query strategies rely, and (b) the challenge of training DNNs on small data. To investigate the former, we construct a taxonomy of query strategies, which distinguishes between data-based, model-based, and prediction-based instance selection, and investigate the prevalence of these classes in recent research. Moreover, we review recent NN-based advances in NLP like word embeddings or language models in the context of (D)NNs, survey the current state-of-the-art at the intersection of AL, text classification, and DNNs and relate recent advances in NLP to AL. Finally, we analyze recent work in AL for text classification, connect the respective query strategies to the taxonomy, and outline commonalities and shortcomings. As a result, we highlight gaps in current research and present open research questions. 2020-08-17T12:53:20Z > investigates (D)NN-based AL for text classification and inspected factors obstructing its adoption > > - (a) the inability of NNs to provide reliable uncertainty estimates, on which the most commonly used query strategies rely, and > - (b) the challenge of training DNNs on small data. includes a taxonomy of query strategies **AL et DNN, Contrasting Paradigms**: > DNNs are known to excel in particularly at large-scale datasets, but often having large amounts of data available is a strict requirement to perform well at all. AL on the other hand tries to minimize the labeled data. A Survey of Active Learning for Text Classification using Deep Neural Networks [2008.07267] A Survey of Active Learning for Text Classification using Deep Neural Networks 2020-08-17T12:53:20Z Graham Neubig 2022-03-17T16:48:22Z [2203.09435] Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation Xinyi Wang The performance of multilingual pretrained models is highly dependent on the availability of monolingual or parallel text present in a target language. Thus, the majority of the world's languages cannot benefit from recent progress in NLP as they have no or limited textual data. To expand possibilities of using NLP technology in these under-represented languages, we systematically study strategies that relax the reliance on conventional language resources through the use of bilingual lexicons, an alternative resource with much better language coverage. We analyze different strategies to synthesize textual or labeled data using lexicons, and how this data can be combined with monolingual or parallel text when available. For 19 under-represented languages across 3 tasks, our methods lead to consistent improvements of up to 5 and 15 points with and without extra monolingual text respectively. Overall, our study highlights how NLP methods can be adapted to thousands more languages that are under-served by current technology 2203.09435 2022-09-08T11:17:10Z Sebastian Ruder Xinyi Wang 2022-04-06T12:47:10Z 2022-09-08 Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation 2022-09-06 2022-09-06T23:16:54Z Ramsri Goutham Golla sur Twitter : "how you can extract keywords from any text or document using only sentence transformer vector embeddings?" 2022-09-05T22:07:47Z 2022-09-05 Massimo sur Twitter : "This ultra-small generator is highly portable and works even with shallow, slow moving water..." 2022-09-14T21:17:26Z 2022-09-14 Anthropic sur Twitter : "Neural networks often pack many unrelated concepts into a single neuron – a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging..." neural networks can store more than 𝑛 features in "superposition" if the features are sparse! [Tweet by C. Olah](https://twitter.com/ch402/status/1570096782390226944) Manuel R. Ciosici Michael Hassid Peter Milder 2022-08-31T20:32:35Z Noam Slonim Betty van Aken Ji-Ung Lee Edwin Simpson 2209.00099 Roy Schwartz Efficient Methods for Natural Language Processing: A Survey Colin Raffel 2022-09-04 2022-08-31T20:32:35Z Marcos Treviso Pedro H. Martins Getting the most out of limited resources allows advances in natural language processing (NLP) research and practice while being conservative with resources. Those resources may be data, time, storage, or energy. Recent work in NLP has yielded interesting results from scaling; however, using only scale to improve results means that resource consumption also scales. That relationship motivates research into efficient methods that require less resources to achieve similar results. This survey relates and synthesises methods and findings in those efficiencies in NLP, aiming to guide new researchers in the field and inspire the development of new methods. [2209.00099] Efficient Methods for Natural Language Processing: A Survey Sara Hooker Leon Derczynski 2022-09-04T11:26:48Z Niranjan Balasubramanian André F. T. Martins > We thus structure this survey by following the typical NLP model pipeline and present the existing methods that aim to make the respective stage more efficient. Tianchu Ji Kenneth Heafield Qingqing Cao Marcos Treviso 2022-09-02T14:24:14Z 2022-09-02 Nils Reimers - slides & recordings of my invited talks Ban Kawas Ranit Aharonov Marina Danilevsky A Survey of the State of Explainable AI for Natural Language Processing Kun Qian 2022-09-08 Prithviraj Sen 2020-10-01T22:33:21Z 2022-09-08T09:30:14Z Yannis Katsis 2010.00711 2020-10-01T22:33:21Z Recent years have seen important advances in the quality of state-of-the-art models, but this has come at the expense of models becoming less interpretable. This survey presents an overview of the current state of Explainable AI (XAI), considered within the domain of Natural Language Processing (NLP). We discuss the main categorization of explanations, as well as the various ways explanations can be arrived at and visualized. We detail the operations and explainability techniques currently available for generating explanations for NLP model predictions, to serve as a resource for model developers in the community. Finally, we point out the current gaps and encourage directions for future work in this important research area. [2010.00711] A Survey of the State of Explainable AI for Natural Language Processing Marina Danilevsky Fine-tuned pre-trained language models (LMs) have achieved enormous success in many natural language processing (NLP) tasks, but they still require excessive labeled data in the fine-tuning stage. We study the problem of fine-tuning pre-trained LMs using only weak supervision, without any labeled data. This problem is challenging because the high capacity of LMs makes them prone to overfitting the noisy labels generated by weak supervision. To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision. Underpinned by contrastive regularization and confidence-based reweighting, this contrastive self-training framework can gradually improve model fitting while effectively suppressing error propagation. Experiments on sequence, token, and sentence pair classification tasks show that our model outperforms the strongest baseline by large margins on 7 benchmarks in 6 tasks, and achieves competitive performance with fully-supervised fine-tuning methods. 2021-03-31T02:25:55Z Wendi Ren 2022-09-02T11:02:48Z Chao Zhang Yue Yu [2010.07835] Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach 2022-09-02 Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach Yue Yu Tuo Zhao Haoming Jiang Fine-tune models with weak supervision only (+ unlabeled data), label denoising via contrastive self-training Simiao Zuo 2020-10-15T15:55:08Z 2010.07835 > The use of Actice Learning (AL) with deep pre-trained models has so far received little consideration. > > We study the potential of (i) various AL strategies; (ii) in conjunction with BERT, (iii) within a highly challenging – yet common – real-world scenario of class imbalance and scarce labeled data. focused on binary classification > AL can boost BERT performance, especially in the most realistic scenario in which the initial set of labeled examples is created using keyword-based queries, resulting in a biased sample of the minority class. [Github](https://github.com/IBM/low-resource-text-classification-framework) 2022-09-02T16:08:49Z 2022-09-02 Active Learning for BERT: An Empirical Study - ACL Anthology 2022-09-02 Conclusion: - The knowledge in models get outdated quickly - BERT thinks Barack Obama is the current US president - Search has a strong focus on recent events - Not reflected in any benchmark so far - Dense models especially sensitive - Issue affects all models (Cross-Encoder, Sparse Emb., doc2query) - How can we efficiently update our models to new domains / new language? - Current methods are extremely data inefficient - How can we update our model from a single example? Domain Adaptation for Dense Retrieval Models - Nils Reimers - ICML 2022 Workshop on Knowledge Retrieval and Language Models 2022-09-02T14:37:21Z merve sur Twitter : "@huggingface transformers includes a new pipeline called Document Question Answering. This is a pipeline you can use to extract information from PDFs!... 2022-09-20T19:01:33Z 2022-09-20 [other tweet](https://twitter.com/osanseviero/status/1572332963378958338?s=20&t=Ipu3j81b5g7_sxHvh6AXuw) François Fleuret sur Twitter : " how tensors are represented in memory, and how it makes some operations super fast..." 2022-09-30 2022-09-30T15:18:14Z 2022-09-02T08:20:00Z A framework for designing document processing solutions 2022-09-02 2022-09-02T10:25:44Z Philip Vollet sur Twitter : "Extracting information from PDFs or scanned documents is still a challenge! Use the @huggingface LayoutLMv3 model and Prodigy..." [A framework for designing document processing solutions](doc:2022/09/a_framework_for_designing_docum) 2022-09-02 The Power of Natural Language Processing 2022-09-12 2022-09-12T13:31:40Z 2022-09-09 2022-09-09T12:11:13Z Build Python Web Application using Flask and Docker 2022-09-30 2022-09-30T00:17:32Z Role-Playing Paper-Reading Seminars 2022-09-02T14:45:03Z 2022-09-02 Nearest Neighbor Indexes for Similarity Search | Pinecone How to Train an mT5 Model for Translation With Simple Transformers | by Thilina Rajapakse | Towards Data Science 2022-09-25T15:02:31Z 2022-09-25 SCAI - Sorbonne Center for Artificial Intelligence 2022-09-15 2022-09-15T13:34:47Z > En s’emparant des captivantes archives cinématographiques des volcanologues Katia et Maurice Krafft, Werner Herzog célèbre avec poésie la vie, brutalement interrompue en 1991, de deux chercheurs et preneurs d’images à l’oeuvre unique. 2022-09-28T23:23:55Z Au coeur des volcans - Requiem pour Katia et Maurice Krafft 2022-09-28 Burkina Faso : à Djibo, une vie sous blocus djihadiste 2022-09-29 2022-09-29T18:58:55Z > DocQuery: Document Query Engine Powered by NLP Ankur Goyal sur Twitter : "DocQuery, a new #opensource query engine for analyzing documents using large language models (LLMs)..." 2022-09-01T23:25:59Z 2022-09-01 Accurate estimates of uncertainty are important for many difficult or sensitive prediction tasks in natural language processing (NLP). Though large-scale pre-trained models have vastly improved the accuracy of applied machine learning models throughout the field, there still are many instances in which they fail. The ability to precisely quantify uncertainty while handling the challenging scenarios that modern models can face when deployed in the real world is critical for reliable, consequential-decision making. This tutorial is intended for both academic researchers and industry practitioners alike, and provides a comprehensive introduction to uncertainty estimation for NLP problems---from fundamentals in probability calibration, Bayesian inference, and confidence set (or interval) construction, to applied topics in modern out-of-distribution detection and selective inference. 2022-09-07 2022-09-07T18:48:16Z Uncertainty Estimation for Natural Language Processing – Google Research Recent Advances in Language Model Fine-tuning (Feb 2021) 2022-09-02 2022-09-02T17:36:39Z Thomas Wolf sur Twitter : "you can divide the size of any model in 🤗 transformers: model.int8()" 2022-09-26T17:42:53Z 2022-09-26 2022-09-30T13:28:23Z 2022-09-30 > Flipped Learning, an alternative method of meta-training which trains the LM to generate the task instruction given the input instance and label. Guess the Instruction! Making Language Models Stronger Zero-Shot Learners | OpenReview [2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving Aditya Prakash Multi-Modal Fusion Transformer for End-to-End Autonomous Driving 2022-09-16 > Our key idea is to exploit the self-attention mechanism of transformersto incorporate the global context for image and LiDAR modalities given their complementary nature. Andreas Geiger Aditya Prakash 2021-04-19T11:48:13Z Kashyap Chitta How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise for perception tasks such as object detection and motion forecasting. However, for the actual driving task, the global context of the 3D scene is key, e.g. a change in traffic light state can affect the behavior of a vehicle geometrically distant from that traffic light. Geometry alone may therefore be insufficient for effectively fusing representations in end-to-end driving models. In this work, we demonstrate that imitation learning policies based on existing sensor fusion methods under-perform in the presence of a high density of dynamic agents and complex scenarios, which require global contextual reasoning, such as handling traffic oncoming from multiple directions at uncontrolled intersections. Therefore, we propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. We experimentally validate the efficacy of our approach in urban settings involving complex scenarios using the CARLA urban driving simulator. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion. 2022-09-16T19:03:51Z 2104.09224 2021-04-19T11:48:13Z Self-Attention - Transformer Network | Coursera 2022-09-11T01:06:21Z 2022-09-11 Yoav Goldberg Elad Ben Zaken Elad Ben Zaken 2021-06-18T16:09:21Z 2022-09-01 We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge. 2022-03-19T09:52:20Z 2022-09-01T17:20:28Z BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models [2106.10199] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models 2106.10199 > BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that **with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model.** > these findings support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge > The focus on modifying a small group of parameters eases deployment, as the vast majority of the parameters of the model are shared between various NLP tasks Shauli Ravfogel Continuous Learning in NMT using Bilingual Dictionaries 2022-09-17 2022-09-17T17:11:11Z 2022-09-30T00:01:22Z 2022-09-30 L’étoile Absinthe - JACQUES STEPHEN ALEXIS Google AI Blog: TensorStore for High-Performance, Scalable Array Storage 2022-09-23T02:24:43Z 2022-09-23 Use Case: 3D Brain Mapping 2022-09-07T12:57:21Z 2022-09-07 Glyphosate : une étude industrielle sur la neurotoxicité de l’herbicide soustraite aux autorités européennes 2022-09-07T08:25:09Z 2022-09-07 Extractive Question Answering application. • Raphael Sourty Active Learning with AutoNLP and Prodigy 2022-09-06 2022-09-06T18:07:58Z François-Xavier Fauvelle et Anne Lafont : « L’Afrique est présente dans tous les passés » 2022-09-11 2022-09-11T11:16:12Z 2022-09-06T18:33:24Z 2022-09-06 Active Learning: A Survey (C. Aggarwal 2014) Saeid Nahavandi 2022-09-08T09:46:24Z Vladimir Makarenkov A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges 2022-09-08 Farhad Pourpanah Dana Rezazadegan Abbas Khosravi Sadiq Hussain U Rajendra Acharya Xiaochun Cao Moloud Abdar 2020-11-12T06:41:05Z Paul Fieguth Mohammad Ghavamzadeh Moloud Abdar 2011.06225 Li Liu [2011.06225] A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. It can be applied to solve a variety of real-world applications in science and engineering. Bayesian approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. In this regard, researchers have proposed different UQ methods and examined their performance in a variety of applications such as computer vision (e.g., self-driving cars and object detection), image processing (e.g., image restoration), medical image analysis (e.g., medical image classification and segmentation), natural language processing (e.g., text classification, social media texts and recidivism risk-scoring), bioinformatics, etc. This study reviews recent advances in UQ methods used in deep learning. Moreover, we also investigate the application of these methods in reinforcement learning (RL). Then, we outline a few important applications of UQ methods. Finally, we briefly highlight the fundamental research challenges faced by UQ methods and discuss the future research directions in this field. 2021-01-06T01:58:12Z 2022-09-04 2022-09-04T13:16:48Z axa-group/Parsr: Transforms PDF, Documents and Images into Enriched Structured Data