]> 2019-11-01T01:09:53Z Luke Zettlemoyer 2019-12-20T23:44:45Z Urvashi Khandelwal [1911.00172] Generalization through Memorization: Nearest Neighbor Language Models Dan Jurafsky We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15.79 - a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail. Omer Levy 1911.00172 Generalization through Memorization: Nearest Neighbor Language Models 2019-12-20 extend LMs with nearest neighbor search in embedding space > kNN-LM, an approach that extends a pre-trained LM by linearly interpolating its next word distribution with a k-nearest neighbors (kNN) model > > This approach allows rare patterns to be memorized explicitly, rather than implicitly in model parameters > The kNN-LM involves augmenting such a pre-trained LM with a nearest neighbors retrieval mechanism, without any additional training (the representations learned by the LM remain unchanged). This can be done with a single forward pass over a text collection (potentially including the original LM training set), where the resulting context-target pairs are stored in a key-value datastore that is queried during inference Mike Lewis 2020-02-15T01:04:52Z Urvashi Khandelwal 2019-12-14T14:33:23Z 2019-12-14 Détection d'intention: application industrielle d'un projet de recherche 2019-12-01T23:30:03Z 2019-12-01 Journée commune AFIA - ARIA - 2 décembre 2019  (((ل()(ل() 'yoav)))) sur Twitter : what do you think should be an interesting and important achievement of 2020 for NLP? 2019-12-15 2019-12-15T10:36:50Z a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance 2019-12-11T03:25:37Z 2019-12-11 Correlation clustering - Wikipedia 2019-12-03T10:47:05Z 2019-12-03 Extraction de relation via la validation de relation > la validation de relation semble être plus importante que l’extraction ! mais très peu de personnes s’y intéressent :( 2019-12-07T11:20:22Z 2019-12-07 Highlights from CoNLL and EMNLP 2019 Defiant Mark Zuckerberg defends Facebook policy to allow false ads | Technology | The Guardian 2019-12-03 2019-12-03T00:47:10Z 2019-12-14 Introducing the Annotated Text Plugin for Elasticsearch: Search for Things (not Strings) | Elastic Blog 2019-12-14T01:01:10Z 2019-12-19 (((ل()(ل() 'yoav)))) sur Twitter : "is there a convincingly successful application of graph convolutions in NLP you can point me to?" 2019-12-19T13:48:55Z 2019-12-07T16:47:28Z 2019-12-07 Natural Language Processing – Current Applications and Future Possibilities Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples. Michael Fop 2019-12-11 2017-07-02T15:29:13Z Variable Selection Methods for Model-based Clustering Michael Fop 2018-06-04T07:52:56Z Thomas Brendan Murphy 2019-12-11T03:15:56Z 1707.00306 [1707.00306] Variable Selection Methods for Model-based Clustering Yoshua Bengio, Revered Architect of AI, Has Some Ideas About What to Build Next - IEEE Spectrum 2019-12-18 2019-12-18T14:55:47Z 2019-12-03 2019-12-03T00:08:32Z Devant l’imminence de l’interdiction du chlorpyrifos en Europe, les fabricants contre-attaquent 2019-12-10T11:57:28Z 2019-12-10 mozilla/DeepSpeech: A TensorFlow implementation of Baidu's DeepSpeech architecture open source Speech-To-Text engine La nouvelle scène de la cuisine brésilienne 2019-12-06 2019-12-06T12:59:55Z Can You Hear Me Now? Improved Voice Assistant Keyword Spotting with Alibaba 2019-12-06T10:38:23Z 2019-12-06 Wenhu Chen Shiyang Li 2019-12-01 William Yang Wang Jianshu Chen fact verification given semi-structured data as evidence Xiyou Zhou Hong Wang The problem of verifying whether a textual hypothesis holds based on the given evidence, also known as fact verification, plays an important role in the study of natural language understanding and semantic representation. However, existing studies are mainly restricted to dealing with unstructured evidence (e.g., natural language sentences and documents, news, etc), while verification under structured evidence, such as tables, graphs, and databases, remains under-explored. This paper specifically aims to study the fact verification given semi-structured data as evidence. To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED. TabFact is challenging since it involves both soft linguistic reasoning and hard symbolic reasoning. To address these reasoning challenges, we design two different models: Table-BERT and Latent Program Algorithm (LPA). Table-BERT leverages the state-of-the-art pre-trained language model to encode the linearized tables and statements into continuous vectors for verification. LPA parses statements into programs and executes them against the tables to obtain the returned binary value for verification. Both methods achieve similar accuracy but still lag far behind human performance. We also perform a comprehensive analysis to demonstrate great future opportunities. The data and code of the dataset are provided in \url{https://github.com/wenhuchen/Table-Fact-Checking}. 2019-12-31T17:16:32Z 2019-09-05T00:25:17Z Wenhu Chen 2019-12-01T13:20:21Z Hongmin Wang Yunkai Zhang 1909.02164 TabFact: A Large-scale Dataset for Table-based Fact Verification [1909.02164] TabFact: A Large-scale Dataset for Table-based Fact Verification Winograd Schema Challenge - Wikipedia The city councilmen refused the demonstrators a permit because they [feared/advocated] violence. 2019-12-18T14:47:30Z 2019-12-18 **The topic of natural language dialog between people and machines is probably going to be analytics**, and the mechanism to make that happen is natural language processing. **Graph databases make this possible because they have a very natural fit with language processing**. NLP at Scale for Maintenance and Supply Chain Management 2019-12-07 2019-12-07T18:53:05Z How to Turn Off Smart TV Snooping Features - Consumer Reports 2019-12-29 2019-12-29T11:43:46Z Les chatbots sont morts, vive les médias 100% messagerie ! 2019-12-15 2019-12-15T00:38:43Z 2019-12-09T09:50:52Z Luca Saglietti 1912.03927 Hugo Cui [1912.03927] Large deviations for the perceptron model and consequences for active learning 2019-12-11T02:26:25Z 2019-12-09T09:50:52Z Hugo Cui Large deviations for the perceptron model and consequences for active learning Lenka Zdeborová 2019-12-11 Active learning is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any active learning algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing active learning algorithms. We also provide a comparison with the performance of some other popular active learning strategies. the task of choosing the subset of samples to be labeled from a fixed finite pool of samples 2019-12-11T16:47:48Z 2019-12-11 Large Memory Layers with Product Keys (poster) 2019-12-03T10:53:50Z 2019-12-03 - Text grounding - Enhancing text representation with knowledge resources - Learning Multi-Modal Word Representation Grounded in Visual Context CONCEPTUAL GROUNDING FOR TEXT REPRESENTATION LEARNING 2019-12-17 2019-12-17T11:27:04Z Les réseaux sémantiques comme outil de travail quotidien (Jean Rohmer) > Using orthogonal property of Legendre polynomials, Legendre Memory Units (LMU) can efficiently handle temporal dependencies spanning 100k timesteps, converge rapidly and use fewer internal state-variables compares to LSTMs. 2019-12-11T16:43:04Z 2019-12-11 hardmaru sur Twitter : Legendre Memory Units 2019-12-07T20:18:50Z 2019-12-07 12 NLP Examples: How Natural Language Processing is Used 2019-12-07T11:26:22Z 2019-12-07 Meta Reinforcement Learning 2019-12-04T20:13:30Z La "Vénus" de Tursac | Musée archéologie nationale 2019-12-04 Deep Learning for Symbolic Mathematics Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica. 1912.01412 2019-12-02T15:05:24Z [1912.01412] Deep Learning for Symbolic Mathematics Guillaume Lample 2019-12-09 2019-12-02T15:05:24Z Guillaume Lample François Charton 2019-12-09T17:11:42Z Subspace clustering - Towards Data Science 2019-12-11 2019-12-11T03:29:25Z Will Grathwohl 1912.03263 Will Grathwohl Kuan-Chieh Wang David Duvenaud 2019-12-11T19:57:55Z 2019-12-09 Jörn-Henrik Jacobsen 2019-12-09T23:28:51Z 2019-12-06T18:00:36Z Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One Mohammad Norouzi We propose to reinterpret a standard discriminative classifier of p(y|x) as an energy based model for the joint distribution p(x,y). In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(x|y). Within this framework, standard discriminative architectures may beused and the model can also be trained on unlabeled data. We demonstrate that energy based training of the joint distribution improves calibration, robustness, andout-of-distribution detection while also enabling our models to generate samplesrivaling the quality of recent GAN approaches. We improve upon recently proposed techniques for scaling up the training of energy based models and presentan approach which adds little overhead compared to standard classification training. Our approach is the first to achieve performance rivaling the state-of-the-artin both generative and discriminative learning within one hybrid model. [1912.03263] Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One Kevin Swersky Machine Learning on Graphs @ NeurIPS 2019 - ML Review - Medium 2019-12-21T01:10:52Z 2019-12-21 Custom Named Entity Recognition Using spaCy - Towards Data Science 2019-12-31 2019-12-31T11:31:41Z NER algo benchmark: spaCy, Flair, m-BERT and camemBERT on anonymizing French commercial legal cases Second post, [First part: Why we switched from Spacy to Flair to anonymize French case law](doc:2021/02/why_we_switched_from_spacy_to_f) > It has been the most striking aspect of this project, each effort we put on the **annotation quality** has been translated to score improvement, even the smallest ones. 2019-12-17T14:46:24Z 2019-12-17 2019-12-07 2019-12-07T11:24:42Z Introducing the New Snorkel (2019) 1905.11852 [1905.11852] EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction 2019-05-28T14:33:19Z 2019-12-14T01:11:17Z includes presentation of [Educe](/doc/2019/12/_1905_11852_educe_explaining_) Unsupervised Learning with Text (AFIA 2019) 2019-12-14 Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model's prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. The presence of a concept is decided from an excerpt i.e. a small sequence of consecutive words in the text. Relevant concepts for the prediction task at hand are automatically defined by our model, avoiding the need for concept-level annotations. To ease interpretability, we enforce that for each concept, the corresponding excerpts share similar semantics and are differentiable from each others. We experimentally demonstrate the relevance of our approach on text classification and multi-sentiment analysis tasks. 2019-12-05 Diane Bouchacourt 2019-09-27T14:16:30Z Diane Bouchacourt Ludovic Denoyer > Providing explanations along with predictions is crucial in some text processing tasks. Therefore, we propose a new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input. To do so, our model's prediction relies solely on a low-dimensional binary representation of the input, where each feature denotes the presence or absence of concepts. Presented in these [slides](/doc/2019/12/unsupervised_learning_with_text) 2019-12-05T15:03:48Z 2019-12-11 2019-12-11T16:29:53Z > How to have a SotA identification of Disease and Chemical entities in 10 lines of code! Named Entity Recognition with Pytorch Transformers – Pierre-Yves Vandenbussche