]> 2019-06-16T23:23:27Z 2019-06-16 Transports : la Chine prépare sa « société de l’hydrogène » - Le Parisien Word Embeddings: 6 Years Later 2019-06-03 2019-06-03T08:48:30Z 2019-06-17T23:40:17Z 2019-06-17 Relearn CSS layout: Every Layout Transfer Learning in Natural Language Processing - Google Slides 2019-06-04T09:19:10Z 2019-06-04 2019-06-10T00:04:56Z 2019-06-10 A Structural Probe for Finding Syntax in Word Representations Certain neural networks (e.g., BERT) build internal geometric representations of syntax trees. (A mysterious “squared distance” effect, explained [here](http://127.0.0.1:8080/semanlink/doc/2019/06/language_trees_and_geometry_i)) [Related blog post](https://nlp.stanford.edu/~johnhew/structural-probe.html) [1906.02715] Visualizing and Measuring the Geometry of BERT Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations. 2019-06-07 > At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations Adam Pearce Been Kim Visualizing and Measuring the Geometry of BERT Martin Wattenberg 2019-06-06T17:33:22Z Fernanda Viégas Emily Reif 2019-10-28T17:53:14Z Ann Yuan Andy Coenen Andy Coenen 1906.02715 2019-06-07T23:33:36Z Notes about [this paper](/doc/2019/06/_1906_02715_visualizing_and_me) > Exactly how neural nets represent linguistic information remains mysterious. But we're starting to see enticing clues... Language, trees, and geometry in neural networks 2019-06-09 2019-06-09T23:26:24Z 2019-06-11T11:04:08Z 2019-06-11 Speech to Text Demo - Watson Audio classification using transfer learning approach – mc.ai 2019-06-29 2019-06-29T10:17:42Z [1905.10070] Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification Boli Chen Xin Huang Xin Huang Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification 2019-06-22 1905.10070 Liping Jing 2019-05-24T07:30:34Z Lin Xiao Extreme multi-label text classification (XMTC) aims at tagging a document with most relevant labels from an extremely large-scale label set. It is a challenging problem especially for the tail labels because there are only few training documents to build classifier. This paper is motivated to better explore the semantic relationship between each document and extreme labels by taking advantage of both document content and label correlation. Our objective is to establish an explicit label-aware representation for each document with a hybrid attention deep neural network model(LAHA). LAHA consists of three parts. The first part adopts a multi-label self-attention mechanism to detect the contribution of each word to labels. The second part exploits the label structure and document content to determine the semantic connection between words and labels in a same latent space. An adaptive fusion strategy is designed in the third part to obtain the final label-aware document representation so that the essence of previous two parts can be sufficiently integrated. Extensive experiments have been conducted on six benchmark datasets by comparing with the state-of-the-art methods. The results show the superiority of our proposed LAHA method, especially on the tail labels. > This paper is motivated to better explore the semantic **relationship between each document and extreme labels by taking advantage of both document content and label correlation**. Our objective is to establish an explicit **label-aware representation for each document**. > LAHA consists of three parts. > 1. The first part adopts a multi-label self-attention mechanism **to detect the contribution of each word to labels**. > 2. The second part exploits the label structure and document content **to determine the semantic connection between words and labels in a same latent space**. > 3. An adaptive fusion strategy is designed in the third part to obtain the final label-aware document representation [Github](https://github.com/HX-idiot/Hybrid_Attention_XML) // TODO compare with [this](doc:2020/08/2003_11644_multi_label_text_c) 2019-07-12T02:45:08Z 2019-06-22T17:15:57Z Towards Reproducible Research with PyTorch Hub | PyTorch 2019-06-11 2019-06-11T11:39:01Z 2019-06-02 2019-06-02T09:46:55Z Le philosophe et académicien Michel Serres est mort Voice Dictation - Online Speech Recognition using chrome 2019-06-11 2019-06-11T11:10:18Z 2019-06-17T18:35:23Z 2019-06-17 Shaping Linked Data apps | Ruben Verborgh 2019-06-14 2019-06-14T23:36:12Z La surveillance, stade suprême du capitalisme ? The Age of Surveillance Capitalism, Shoshana Zuboff 2019-12-17T14:42:23Z Salvador García 2019-06-18T10:41:40Z 2019-06-18 [1812.05944] A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Experiments Juan Luis Suárez Distance metric learning is a branch of machine learning that aims to learn distances from the data. Distance metric learning can be useful to improve similarity learning algorithms, and also has applications in dimensionality reduction. This paper describes the distance metric learning problem and analyzes its main mathematical foundations. In addition, it also discusses some of the most popular distance metric learning techniques used in classification, showing their goals and the required information to understand and use them. Furthermore, some experiments to evaluate the performance of the different algorithms are also provided. Finally, this paper discusses several possibilities of future work in this topic. Juan Luis Suárez A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Experiments distance metric learning, a branch of machine learning that aims to learn distances from the data 2018-12-14T14:07:36Z 1812.05944 Francisco Herrera A Lagos, le roi des Peuls est aussi le boss des dockers 2019-06-29 2019-06-29T11:49:31Z [Github](https://github.com/alexandra-chron/siatl) 2019-06-08T12:14:30Z 2019-06-08 An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models (NAACL 2019) 2019-06-08 2019-06-08T12:18:23Z An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models (NAACL 2019) (Slides) Transferable Neural Projection Representations (2019) 2019-06-06T01:43:47Z 2019-06-06 Forget word embeddings? > Neural word representations representations occupy huge memory making it hard to deploy on-device and often do not generalize to unknown words due to vocabulary pruning. In this paper, we propose a skip-gram based architecture coupled with Locality-Sensitive Hashing (LSH) projections to learn efficient dynamically computable representations. Our model does not need to store lookup tables as representations are computed on-the-fly and require low memory footprint. The representations can be trained in an unsupervised fashion and can be easily transferred to other NLP tasks. For qualitative evaluation, we analyze the nearest neighbors of the word representations and discover semantically similar words even with misspellings. For quantitative evaluation, we plug our transferable projections into a simple LSTM and run it on multiple NLP tasks and show how our transferable projections achieve better performance compared to prior work. One Shot Learning with Siamese Networks using Keras 2019-06-28 2019-06-28T19:00:27Z the network is learning a **similarity function**, which takes two images as input and expresses how similar they are. > Assume that we want to build face recognition system for a small organization with only 10 employees... 2019-08-12T04:04:15Z David Duvenaud 2019-06-24T08:33:44Z 2019-06-24 Towards Understanding Linear Word Analogies Graeme Hirst A surprising property of word vectors is that word analogies can often be solved with vector arithmetic. However, it is unclear why arithmetic operators correspond to non-linear embedding models such as skip-gram with negative sampling (SGNS). We provide a formal explanation of this phenomenon without making the strong assumptions that past theories have made about the vector space and word distribution. Our theory has several implications. Past work has conjectured that linear substructures exist in vector spaces because relations can be represented as ratios; we prove that this holds for SGNS. We provide novel justification for the addition of SGNS word vectors by showing that it automatically down-weights the more frequent word, as weighting schemes do ad hoc. Lastly, we offer an information theoretic interpretation of Euclidean distance in vector spaces, justifying its use in capturing word dissimilarity. [1810.04882] Towards Understanding Linear Word Analogies Kawin Ethayarajh Kawin Ethayarajh 1810.04882 2018-10-11T08:08:40Z 2019-06-03T08:38:41Z 2019-06-03 Trip Report: AKBC 2019 (1st Conference on Automated Knowledge Base Construction) An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep learning dynamics to give rise to these regularities. Surya Ganguli A mathematical theory of semantic development in deep neural networks Andrew M. Saxe 2019-06-29 James L. McClelland Andrew M. Saxe 1810.10531 2018-10-23T22:20:27Z 2018-10-23T22:20:27Z > a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? [1810.10531] A mathematical theory of semantic development in deep neural networks 2019-06-29T15:22:55Z > a broad, high-level overview of recent weak supervision approaches, where noisier or higher-level supervision is used as a more expedient and flexible way to get supervision signal, in particular from **subject matter experts** (SMEs). broad definition of weak supervision as being comprised of **one or more noisy conditional distributions over unlabeled data**. Key practical motivation: what if a SME could spend an afternoon specifying a set of heuristics or other resources, that–if handled properly–could effectively replace thousands of training labels? Contains a good comparison of the settings in active, semi-supervised, transfer learning (and links to surveys about them) Weak Supervision: The New Programming Paradigm for Machine Learning 2019-06-28 2019-06-28T02:45:11Z 2019-06-11T01:31:41Z Large pre-trained neural networks such as BERT have had great recent success in NLP, motivating a growing body of research investigating what aspects of language they are able to learn from unlabeled data. Most recent analysis has focused on model outputs (e.g., language model surprisal) or internal vector representations (e.g., probing classifiers). Complementary to these works, we propose methods for analyzing the attention mechanisms of pre-trained models and apply them to BERT. BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy. Lastly, we propose an attention-based probing classifier and use it to further demonstrate that substantial syntactic information is captured in BERT's attention. Omer Levy 1906.04341 Kevin Clark Kevin Clark 2019-06-21 2019-06-21T21:49:32Z What Does BERT Look At? An Analysis of BERT's Attention Christopher D. Manning Urvashi Khandelwal [1906.04341] What Does BERT Look At? An Analysis of BERT's Attention 2019-06-11T01:31:41Z [Jeremy Howard's answer](https://forums.fast.ai/t/nlp-challenge-project/44153) "I made a bet that a Naive Bayes classifier would work as well on humor recognition as a neural net with fine-tuned Bert embeddings. I won" 2019-06-06 2019-06-06T22:48:05Z 2019-06-06T08:34:01Z Visual and conceptual grounding for text representation learning 2019-06-06 Papers - ACL 2019 2019-06-12 2019-06-12T20:32:27Z 2019-06-17T22:56:54Z 2019-06-17 The Last Picture Show (La Dernière Séance) 20% Accuracy Bump in Text Classification with ME-ULMFiT 2019-06-23 2019-06-23T23:58:05Z 2019-06-24T08:36:06Z 2019-06-24 When and Why does King - Man + Woman = Queen? (ACL 2019) | Kawin Ethayarajh [paper](doc/2019/06/_1810_04882_towards_understand) ; [blog post](/doc/2019/06/when_and_why_does_king_man_) 2019-06-24T08:31:21Z Kawin Ethayarajh sur Twitter : "When and why does king - man + woman = queen?" 2019-06-24 2019-06-11 2019-06-11T11:51:14Z > Even though neural networks enjoy widespread use, they still struggle to learn the basic laws of physics. How might we endow them with better inductive biases? In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner. Hamiltonian Neural Networks 2019-06-20T01:13:51Z 2019-06-20 Dinosaur asteroid hit 'worst possible place' - BBC News Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition 2019-06-18T10:38:18Z 2019-06-18 2019-06-21T21:44:22Z 2019-06-21 The Terrible Truth About Amazon Alexa and Privacy Lessons Learned from Applying Deep Learning for NLP Without Big Data 2019-06-29 2019-06-29T11:52:44Z 2019-06-01T14:56:33Z 2019-06-01 The Quiet Semi-Supervised Revolution – Towards Data Science 2019-06-21T16:29:51Z Ruslan Salakhutdinov Zhilin Yang XLNet: Generalized Autoregressive Pretraining for Language Understanding With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking. Quoc V. Le 2019-06-21 [1906.08237] XLNet: Generalized Autoregressive Pretraining for Language Understanding Zihang Dai Yiming Yang Zhilin Yang 2020-01-02T12:48:08Z 1906.08237 a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) Jaime Carbonell 2019-06-19T17:35:48Z 2019-06-18 2019-06-18T22:34:39Z Kokopelli (association) NLP: Contextualized word embeddings from BERT – Towards Data Science 2019-06-12T08:24:42Z 2019-06-12 how existing knowledge in an organization can be used as noisier, higher-level supervision—or, as it is often termed, weak supervision—to quickly label large training datasets Snorkel Drybell, experimental internal system, which adapts the opensource Snorkel framework to **use diverse organizational knowledge resources—like internal models, ontologies, legacy rules, knowledge graphs and more—in order to generate training data** for machine learning models at web scale. Enables writing **labeling functions** that label training data programmatically [paper](/doc/2019/06/_1812_00417_snorkel_drybell_a) 2019-06-28T02:00:39Z Daniel Rodriguez Stephen H. Bach 2018-12-02T16:23:36Z Souvik Sen Chong Luo 2019-06-28T00:31:17Z Rob Malkin 2019-06-28 Haidong Shao 1812.00417 Stephen H. Bach Alexander Ratner > study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude. > Snorkel DryBell, a new weak supervision management system for this setting. [Blog post](/doc/2019/06/google_ai_blog_harnessing_orga) [1812.00417] Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale Yintao Liu Christopher Ré Houman Alborzi Rahul Kuchhal Cassandra Xia 2019-06-03T22:52:25Z Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications. We present a first-of-its-kind study showing how existing knowledge resources from across an organization can be used as weak supervision in order to bring development time and cost down by an order of magnitude, and introduce Snorkel DryBell, a new weak supervision management system for this setting. Snorkel DryBell builds on the Snorkel framework, extending it in three critical aspects: flexible, template-based ingestion of diverse organizational knowledge, cross-feature production serving, and scalable, sampling-free execution. On three classification tasks at Google, we find that Snorkel DryBell creates classifiers of comparable quality to ones trained with tens of thousands of hand-labeled examples, converts non-servable organizational resources to servable models for an average 52% performance improvement, and executes over millions of data points in tens of minutes. Braden Hancock 2019-06-28 Google AI Blog: Harnessing Organizational Knowledge for Machine Learning (2019)