]> 2019-01-27 2019-01-27T01:06:32Z The Hidden Automation Agenda of the Davos Elite - The New York Times “The choice isn’t between automation and non-automation,” said Erik Brynjolfsson, the director of M.I.T.’s Initiative on the Digital Economy. “It’s between whether you use the technology in a way that creates shared prosperity, or more concentration of wealth.” 2019-01-23 14 NLP Research Breakthroughs You Can Apply To Your Business - 2018 2019-01-23T22:56:57Z Colaboratory notebook that shows how you can apply ML and NLP to the content of your own @feedly feeds. 2019-01-29 2019-01-29T00:47:23Z Run NLP Experiments using the Feedly API.ipynb - Colaboratory Trois présentations : - une première décrivant l'utilisation de données textuelles dans le cadre de la conception d'une opération marketing (définition de l'orientation à donner à l'image d'une marque). Les techniques NLP utilisées sont simples, mais leur utilisation comme outil pour aider les créatifs du marketing est originale. - une deuxième dans le domaine juridique, très intéressante en termes de techniques mises en oeuvre. L'application vise à la valorisation de bases de contrats (un corpus de textes, sur lequel on souhaite effectuer des recherches complexes), en s'appuyant à la fois sur les techniques récentes de représentation de textes, et sur un knowledge graph (ontologie de termes juridiques). Pour ce qui est de le représentation de textes, ils utilisent Google BERT. Ce que permet BERT, c'est une forme de transfert d'apprentissage : BERT est un réseau de neurones profond entrainé de façon non supervisée, par et chez Google, sur une énorme quantité de textes, de façon à emmagasiner une connaissance sur une langue ("pre-trained language model"). Ces données (c'est à dire ce réseau pré-entrainé) sont mises à disposition par Google. Chacun peut ainsi affiner l'entrainement du réseau sur son propre corpus de textes, et ses propres données labellisées concernant le problème qu'il souhaite effectivement résoudre (par exemple, dans le cas juridique, la reconnaissance d'entités dans les contrats). Le speaker rapporte des résultats sensiblement améliorés par rapport à ce qu'ils obtenaient auparavant en ce qui concerne la qualité de la représentation de phrases, et des problèmes de type classification de phrases ou reconnaissance d'entités (il note que la représentation de textes longs reste un problème ouvert). Les temps d'entrainement de BERT sur leur problème ne sont pas exorbitants (il parle de quelques heures de GPU, pas de jours ou de semaines de TPU comme pour l'entrainement initial). - le dernier speaker quant à lui a présenté deux papiers de recherche, justement sur les techniques au cœur de BERT ("Transformer architecture"). 2019-01-29 Paris NLP Season 3 Meetup #3 at Doctrine – Paris NLP 2019-01-29T09:33:31Z 2019-01-29 2019-01-29T15:06:30Z Cheatsheet · fastText > TL;DR: Despite its ubiquity in deep learning, Tensor is broken. It forces bad habits such as exposing private dimensions, broadcasting based on absolute position, and keeping type information in documentation. This post presents a proof-of-concept of an alternative approach, named tensors, with named dimensions 2019-01-04T22:05:52Z 2019-01-04 Tensor Considered Harmful 2019-01-16 2019-01-16T22:21:35Z What is torch.nn really? — PyTorch Tutorials 1.0.0 2019-01-07 2019-01-07T09:19:09Z The Basics of NumPy Arrays | Python Data Science Handbook python - The meaning of shapes in NumPy - Stack Overflow 2019-01-17T22:13:34Z 2019-01-17 2019-01-23T17:55:56Z 2019-01-23 > We are open-sourcing a newly expanded and enhanced version of our natural language processing toolkit, LASER. It now performs zero-shot cross-lingual transfer with more than 90 languages, written in 28 different alphabets. LASER natural language processing toolkit - Facebook Code 2019-01-01 2019-01-01T13:23:57Z SageDB: A Learned Database System cf. [The case for learned index structures](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1712.01208v1) Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably coherent, novel text articles with thousands of tokens. Our code, pretrained models, and hyperparameters are available in both Tensorflow and PyTorch. Zihang Dai 2019-01-11 Zhilin Yang 2019-01-11T17:32:14Z [1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Jaime Carbonell 2019-01-09T18:28:19Z 1901.02860 Yiming Yang Zihang Dai Quoc V. Le Ruslan Salakhutdinov 2019-06-02T21:21:48Z Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Using FastText models (not vectors) for robust embeddings | Kaggle 2019-01-29 2019-01-29T11:36:01Z What should I do when my neural network doesn't learn? 2019-01-27T12:41:57Z 2019-01-27 2019-01-03 2019-01-03T17:45:41Z Another data science student's blog (Sylvain Gugger) Troubleshooting Deep Neural Networks 2019-01-27T11:50:54Z 2019-01-27 2019-01-15T07:42:17Z Finding Data Block Nirvana (a journey through the fastai data block API) 2019-01-15 2019-01-21T10:58:18Z Training Cutting-Edge Neural Networks with Tensor2Tensor and 10 lines of code 2019-01-21 2019-01-29T01:23:57Z 2019-01-29 Spiking Neural Networks, the Next Generation of Machine Learning (2018) Douglas Rushkoff: How to be "Team Human" in the digital future | TED Talk 2019-01-02T13:00:39Z 2019-01-02 Querying machine learning distributional semantics with SPARQL - bobdc.blog 2019-01-29T00:59:35Z 2019-01-29 2019-01-15T13:25:00Z 2019-01-15 Scientist Brad Lister returned to Puerto Rican rainforest after 35 years to find 98% of ground insects had vanished Insect collapse: ‘We are destroying our life support systems’ | Environment | The Guardian 2019-01-01 2019-01-01T12:39:29Z Arvind Narayanan sur Twitter : "In 2018 the blockchain/decentralization story fell apart. For example, a study of 43 use cases found a 0% success rate" Romain Vial (Hyperlex) at Paris NLP meetup, slides 2019-01-24 2019-01-24T17:21:48Z > Hyperlex is a contract analytics and management solution powered by artificial intelligence. Hyperlex helps companies manage and make the most of their contract portfolio by identifying relevant information and data to manage key contractual commitments. > Take-home message: > > - Sentence representation starts to be well understood empirically > - Large document representation is still an open (and interesting) problem! StanfordNLP | StanfordNLP 2019-01-30T22:52:47Z 2019-01-30 2019-01-15 2019-01-15T11:43:35Z What is the difference between gradient boosting and adaboost? - Quora Kenya. Pour se rembourser, la Chine pourrait s’emparer du port de Mombasa | Courrier international 2019-01-02T12:45:20Z 2019-01-02 The Most Complete List of Best AI Cheat Sheets Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data 2019-01-25T18:33:11Z 2019-01-25 focus on models that assume that classes are mutually exclusive > If we think of a distribution y as the tool we use to encode symbols, then entropy measures the number of bits we'll need if we use the correct tool . This is optimal, in that we can't encode the symbols using fewer bits on average. > > In contrast, cross entropy is the number of bits we'll need if we encode symbols from using the wrong tool y^. This consists of encoding the i-th symbol using log(1/yi ̂) bits instead of log(1/yi) bits. 2019-01-14 2019-01-14T15:59:17Z A Friendly Introduction to Cross-Entropy Loss Training deep neural networks for binary communication with the Whetstone method | Nature Machine Intelligence 2019-01-29 2019-01-29T01:16:07Z > Here, we describe a new approach to training SNNs, where the ANN training is to not only learn the task, but to produce a SNN in the process. Specifically, if the training procedure can include the eventual objective of low-precision communication between nodes, the training process of a SNN can be nearly as effective as a comparable ANN. This method, which we term Whetstone inspired by the tool to sharpen a dull knife, is intentionally agnostic to both the type of ANN being trained and the targeted neuromorphic hardware. Rather, the intent is to provide a straightforward interface for machine learning researchers to leverage the powerful capabilities of low-power neu-romorphic hardware on a wide range of deep learning applications Whetstone can train neural nets through Keras to be "spiking" without an expansion of the network or an expensive temporal code 2019-01-03 2019-01-03T14:32:01Z clarification about how the process under Article 50 TEU works BPEmb: Subword Embeddings 2019-01-31T23:53:03Z 2019-01-31 a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation Ikuya Yamada > An embedding method specifically **designed for NED** that jointly **maps words and entities into the same continuous vector space**. > We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words Technique later used in [Wikipedia2Vec](doc:?uri=https%3A%2F%2Fwikipedia2vec.github.io%2Fwikipedia2vec%2F), by the same team. [Neural Attentive Bag-of-Entities Model for Text Classification](doc:2020/09/1909_01259_neural_attentive_b) uses Wikipedia2Vec model. Hiroyuki Shindo 2016-06-10T01:51:26Z 2019-01-27T15:29:16Z Hideaki Takeda 1601.01343 Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-the-art accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset. 2016-01-06T22:19:20Z 2019-01-27 Ikuya Yamada [1601.01343] Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation Yoshiyasu Takefuji Part 2: Semi-Supervised Learning with Spectral Graph Convolutions [Part 1](/doc/?uri=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-7d2250723780) How to do Deep Learning on Graphs with Graph Convolutional Networks Part 2 2019-01-30 2019-01-30T13:09:41Z Part 1: A High-Level Introduction to Graph Convolutional Networks [Part 2](/doc/?uri=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-do-deep-learning-on-graphs-with-graph-convolutional-networks-62acf5b143d0) 2019-01-30 2019-01-30T12:54:08Z How to do Deep Learning on Graphs with Graph Convolutional Networks - Part 1 2019-01-27T17:29:21Z Beating the Teacher: Neural Ranking Models with Weak Supervision – Mostafa Dehghani 2019-01-27 Main Idea: To leverage large amounts of unsupervised data to infer “weak” labels and use that signal for learning supervised models as if we had the ground truth labels. Blog post about [this paper](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803) W. Bruce Croft 2019-01-27T17:31:01Z Neural Ranking Models with Weak Supervision Mostafa Dehghani 2017-05-29T11:58:34Z Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models. 2019-01-27 [1704.08803] Neural Ranking Models with Weak Supervision Mostafa Dehghani Jaap Kamps 1704.08803 Hamed Zamani Aliaksei Severyn 2017-04-28T04:08:47Z Main Idea: To **leverage large amounts of unsupervised data to infer “weak” labels** and use that signal for learning supervised models as if we had the ground truth labels. See [blog post](/doc/?uri=http%3A%2F%2Fmostafadehghani.com%2F2017%2F04%2F23%2Fbeating-the-teacher-neural-ranking-models-with-weak-supervision%2F): > This is **truly awesome since we have only used BM25 as the supervisor to train a model which performs better than BM25** itself!