]> Tianyu Gao 2021-04-18 2020-11-02T13:14:57Z Bohan Li On the Sentence Embeddings from Pre-trained Language Models 2020-11-02T13:14:57Z 2011.05864 Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective. Experimental results show that our proposed BERT-flow method obtains significant performance gains over the state-of-the-art sentence embeddings on a variety of semantic textual similarity tasks. The code is available at https://github.com/bohanli/BERT-flow. [2011.05864] On the Sentence Embeddings from Pre-trained Language Models Bohan Li 2021-04-19T01:13:25Z Yiming Yang Lei Li > the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences > > We find that **BERT always induces a non-smooth anisotropic semantic space of sentences**, which harms its performance of semantic similarity. To address this issue, we propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective [GitHub](https://github.com/ bohanli/BERT-flow) > To address these issues, we propose to transform the BERT sentence embedding distribution into a smooth and isotropic Gaussian distribution through normalizing flows (Dinh et al., 2015), which is an invertible function parameterized by neural networks. ... During training, only the flow network is optimized while the BERT parameters remain unchanged > When combined with external supervision from natural language inference tasks (Bowman et al., 2015; Williams et al., 2018), our method outperforms the [Sentence-BERT](tag:sbert) embeddings 2021-04-19 Junxian He Mingxuan Wang Hao Zhou (by one of the authors of [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)) a contrastive sentence embedding framework, which can be used to produce sentence embeddings, from either unlabeled or labeled data. > 1. an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise > 2. we draw inspiration from the recent success of learning sentence embeddings from natural language inference (NLI) datasets and incorporate annotated pairs from NLI datasets into contrastive learning by using “entailment” pairs as positives and “contradiction” pairs as hard negatives Cites [[2011.05864] On the Sentence Embeddings from Pre-trained Language Models](doc:2021/04/2011_05864_on_the_sentence_em) (question of the anisotropic semantic space of BERT's sentences) SimCSE: Simple Contrastive Learning of Sentence Embeddings 2021-04-18T18:28:29Z 2021-04-11 2021-04-11T13:13:13Z [1902.00751] Parameter-Efficient Transfer Learning for NLP 1902.00751 2019-02-02T16:29:47Z Neil Houlsby Sylvain Gelly Andrea Gesmundo Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task. Bruna Morrone 2021-04-11 Mona Attariyan Andrei Giurgiu Stanislaw Jastrzebski **Adapter tuning for NLP**. A strategy for tuning a large text model on several downstream tasks, that permits training on tasks sequentially, and that adds only a small number of additional parameters per task. New modules added between layers of a pre-trained network. Parameters of the original network are frozen and therefore may be shared by many tasks. [GitHub](https://github.com/google-research/adapter-bert) Parameter-Efficient Transfer Learning for NLP 2019-06-13T17:48:30Z Quentin de Laroussilhe Neil Houlsby Focus on the Embedding of Domain-specific Vocabulary. > exBERT adds a new domain-specific vocabulary and the corresponding embedding layer, as well as a small extension module to the original unmodified model > a pretraining method allowing low-cost embedding of domain-specific vocabulary in the context of an existing large pre-trained model such as BERT > exBERT... explicitly incorporates the new domain’s vocabulary, while being able to **reuse the original pre-trained model’s weights as is** to reduce required computation and training data. Specifically, exBERT extends BERT by augmenting its embeddings for the original vocabulary with new embeddings for the domain-specific vocabulary via **a learned small “extension” module**. **The output of the original and extension modules are combined via a trainable weighted sum operation** In a way similar to concept developed in [[1902.00751] Parameter-Efficient Transfer Learning for NLP](doc:2021/04/1902_00751_parameter_efficien), but not in the fine-tuning paradigm. 2021-04-11T10:13:43Z exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources - ACL Anthology Richard Evans [1910.02227] Making sense of sensory input Johannes Welbl Jose Hernandez-Orallo 2021-04-10 > what does it mean to “make sense” of a sensory sequence? Our answer is that making sense means constructing a symbolic theory containing a set of objects that persist over time, with attributes that change over time, according to general laws. This theory must both explain the sensory input, and satisfy unity conditions [the constituents of our theory – objects, properties, and atoms – must be integrated into a coherent whole] 2020-07-14T03:16:30Z 2021-04-10T19:09:06Z 2019-10-05T07:48:55Z Marek Sergot 1910.02227 Richard Evans Pushmeet Kohli This paper attempts to answer a central question in unsupervised learning: what does it mean to "make sense" of a sensory sequence? In our formalization, making sense involves constructing a symbolic causal theory that both explains the sensory sequence and also satisfies a set of unity conditions. The unity conditions insist that the constituents of the causal theory -- objects, properties, and laws -- must be integrated into a coherent whole. On our account, making sense of sensory input is a type of program synthesis, but it is unsupervised program synthesis. Our second contribution is a computer implementation, the Apperception Engine, that was designed to satisfy the above requirements. Our system is able to produce interpretable human-readable causal theories from very small amounts of data, because of the strong inductive bias provided by the unity conditions. A causal theory produced by our system is able to predict future sensor readings, as well as retrodict earlier readings, and impute (fill in the blanks of) missing sensory readings, in any combination. We tested the engine in a diverse variety of domains, including cellular automata, rhythms and simple nursery tunes, multi-modal binding problems, occlusion tasks, and sequence induction intelligence tests. In each domain, we test our engine's ability to predict future sensor values, retrodict earlier sensor values, and impute missing sensory data. The engine performs well in all these domains, significantly out-performing neural net baselines. We note in particular that in the sequence induction intelligence tests, our system achieved human-level performance. This is notable because our system is not a bespoke system designed specifically to solve intelligence tests, but a general-purpose system that was designed to make sense of any sensory sequence. Making sense of sensory input 2021-04-17 2021-04-17T10:07:14Z Nils Reimers sur Twitter : "New models for Neural Information Retrieval..." - <https://www.dictionnairedesfrancophones.org/sparql> - [AFU](https://fr.wiktionary.org/wiki/assistance_au_freinage_d’urgence) Projet:Requêter le Wiktionnaire — Wiktionnaire 2021-04-12T17:16:12Z 2021-04-12 2021-04-04T13:44:14Z 2021-04-04 Cory Doctorow: The zombie economy and digital arm-breakers La multinationale devra payer 50 millions d’euros après avoir été mise en cause par la Commission fédérale du commerce (FTC) pour avoir mis en place ce mécanisme pendant plus de deux ans et demi. Amazon reconnue coupable d’avoir gardé une partie des pourboires de ses livreurs aux Etats-Unis 2021-04-09 2021-04-09T19:13:42Z > 1. Write a prompt that a pre-trained LM can complete to give the answer to your problem, GPT-style. > 2. Use backpropagation on fine-tuning data to learn the correct completions. The model can then draw information from both your task description and the supervised data! ([src](https://twitter.com/huggingface/status/1379805752509005825?s=20)) How many data points is a prompt worth ? 2021-04-08 2021-04-08T13:58:48Z 2021-04-06 2021-04-06T09:28:03Z thunlp/OpenMatch: An Open-Source Package for Information Retrieval. 2021-04-11 2020-12-04T12:49:47Z 2020-12-04T12:49:47Z Recently it has been shown that large pre-trained language models like BERT (Devlin et al., 2018) are able to store commonsense factual knowledge captured in its pre-training corpus (Petroni et al., 2019). In our work we further evaluate this ability with respect to an application from industry creating a set of probes specifically designed to reveal technical quality issues captured as described incidents out of unstructured customer feedback in the automotive industry. After probing the out-of-the-box versions of the pre-trained models with fill-in-the-mask tasks we dynamically provide it with more knowledge via continual pre-training on the Office of Defects Investigation (ODI) Complaints data set. In our experiments the models exhibit performance regarding queries on domain-specific topics compared to when queried on factual knowledge itself, as Petroni et al. (2019) have done. For most of the evaluated architectures the correct token is predicted with a $Precision@1$ ($P@1$) of above 60\%, while for $P@5$ and $P@10$ even values of well above 80\% and up to 90\% respectively are reached. These results show the potential of using language models as a knowledge base for structured analysis of customer feedback. Pre-trained language models as knowledge bases for Automotive Complaint Analysis [2012.02558] Pre-trained language models as knowledge bases for Automotive Complaint Analysis V. D. Viellieber 2021-04-11T09:30:04Z V. D. Viellieber M. Aßenmacher 2012.02558 100 000 morts du Covid-19 en France : ferons-nous en sorte que le monde d’après ne permette plus une telle tragédie ? 2021-04-15T21:48:59Z 2021-04-15 2021-04-11T16:38:59Z 2020-07-31T00:04:15Z Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly-available datasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modeling choices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary with BERT models, such as using complex tagging schemes in named entity recognition (NER). To help accelerate research in biomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created a leaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) at https://aka.ms/BLURB. > A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing Yu Gu Michael Lucas Yu Gu 2007.15779 Hoifung Poon Robert Tinn [2007.15779] Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing Tristan Naumann Jianfeng Gao Hao Cheng Naoto Usuyama Xiaodong Liu 2021-02-11T19:13:59Z 2021-04-11 This work describes our two approaches for the background linking task of TREC 2020 News Track. The main objective of this task is to recommend a list of relevant articles that the reader should refer to in order to understand the context and gain background information of the query article. Our first approach focuses on building an effective search query by combining weighted keywords extracted from the query document and uses BM25 for retrieval. The second approach leverages the capability of SBERT (Nils Reimers et al.) to learn contextual representations of the query in order to perform semantic search over the corpus. We empirically show that employing a language model benefits our approach in understanding the context as well as the background of the query article. The proposed approaches are evaluated on the TREC 2018 Washington Post dataset and our best model outperforms the TREC median as well as the highest scoring model of 2018 in terms of the nDCG@5 metric. We further propose a diversity measure to evaluate the effectiveness of the various approaches in retrieving a diverse set of documents. This would potentially motivate researchers to work on introducing diversity in their recommended list. We have open sourced our implementation on Github and plan to submit our runs for the background linking task in TREC 2020. Udhav Sethi 2020-07-24T16:02:14Z Anup Anand Deshmukh IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles 2021-04-12 [2007.12603] IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles 2020-07-24T16:02:14Z 2021-04-12T18:27:34Z 2007.12603 Anup Anand Deshmukh 2021-04-04T19:39:39Z 2021-04-04 Zinder (Camille Lefebvre | Langarchiv) Henry Every, le pirate le plus recherché du XVIIe siècle, refait surface en Nouvelle-Angleterre 2021-04-03T11:59:47Z 2021-04-03 2021-04-10T23:32:04Z 2021-04-10 Au Brésil, le naufrage de l’opération anticorruption « Lava Jato » Le massif de Lovo, un trésor d'art rupestre à préserver | CNRS Le journal 2021-04-09T12:32:48Z 2021-04-09 Event camera 2021-04-15T17:13:30Z 2021-04-15 CNRS- Zinder 1900-2019 - Pauline Rousseau 2021-04-04 2021-04-04T19:27:15Z En 2018 le CNRS réunit et présente à Zinder une série de photographies d'archives prises dans cette ville vers 1900. 2021-04-01T15:54:10Z 2021-04-01 Éloge des éliminatoires – Une balle dans le pied