]> 2018-07-01T11:03:14Z Andrej Karpathy sur Twitter : most common neural net mistakes 2018-07-01 > 1) you didn't try to overfit a single batch first. ["if you can't overfit on a tiny batch size, things are definitely broken"](https://youtu.be/gYpoJMlgyXA?t=1h1m22s) > 2) you forgot to toggle train/eval mode for the net. > 3) you forgot to .zero_grad() (in pytorch) before .backward(). >4) you passed softmaxed outputs to a loss that expects raw logits. > 5) you didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer .This one won't make you silently fail, but they are spurious parameters > 6) thinking view() and permute() are the same thing (& incorrectly using view) Studying the Spatio-Temporal Dynamics of Small-Scale Events in Twitter 2018-07-12T00:27:39Z 2018-07-12 2018-07-07 2018-07-07T15:14:34Z France-Uruguay. “Tuer ou se faire tuer : les Uruguayens connaissent la règle de la Coupe du monde” | Courrier international L’Uruguay, le pays où le football est roi Anders Søgaard 1608.05426 2018-07-23T12:54:24Z [1608.05426] A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Model-1, which demonstrate similar performance to state-of-the-art embedding algorithms on a variety of benchmarks. Overall, we observe that different algorithmic approaches for utilizing the sentence ID feature space result in similar performance. This paper draws both empirical and theoretical parallels between the embedding and alignment literature, and suggests that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may substantially improve cross-lingual word embeddings, and that future baselines should at least take such features into account. A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments Omer Levy Yoav Goldberg Omer Levy 2017-01-09T20:49:18Z 2016-08-18T20:27:46Z 2018-07-23 UK could cancel Brexit and stay in EU on same terms, says French government 2018-07-27T01:28:58Z 2018-07-27 2018-07-09T15:10:57Z Ethiopia Touts Good Conditions in Factories for Brands Like H&M and Calvin Klein, but Workers Scrape By On $1 a Day 2018-07-09 En Ethiopie, la guerre du teff aura bien lieu 2018-07-13T23:10:06Z 2018-07-13 Une farine traditionnelle éthiopienne, produite depuis 3000 ans, sous le coup d'un brevet hollandais Journal of Artificial Intelligence Research 2018-07-08T19:14:37Z 2018-07-08 2018-07-23 Natural Language Processing is Fun! – Adam Geitgey – Medium 2018-07-23T12:57:34Z Vue.js 2018-07-19 2018-07-19T23:16:39Z Cross-Modal Retrieval in the Cooking Context 2018-07-11T13:05:27Z 2018-07-11 a cross-modal retrieval model aligning visual and textual data (like pictures of dishes and their recipes) in a shared representation space a model that uses attention to boost the speed with which neural machine translation models can be trained, cf. "Attention is all you need" 2018-07-09 2018-07-09T17:27:17Z The Illustrated Transformer – Jay Alammar 2018-07-16T12:25:25Z Out of Africa : nos origines multiples 2018-07-16 2018-07-14 2018-07-14T14:27:33Z La Marseillaise du bicentenaire de la Révolution Six scénarios d'un monde sans travail | CNRS Le journal 2018-07-27 2018-07-27T22:55:59Z 2018-07-05 2018-07-05T09:10:15Z The relativistic discriminator: a key element missing from standard GAN (2018) – Alexia Jolicoeur-Martineau > In this paper, I argue that standard GAN (SGAN) is missing a fundamental property, i.e., training the generator should not only increase the probability that fake data is real but also decrease the probability that real data is real 2018-07-26T00:26:25Z Paris NLP Meetup #6 2018-07-26 [blog post](https://nlpparis.wordpress.com/2018/07/26/paris-nlp-meetup-6-season-2-linkvalue/) Who is doing interesting NLP research for low resource languages? - Quora 2018-07-03 2018-07-03T11:14:36Z 2018-07-07 2018-07-07T10:04:33Z France, terre de dinosaures | CNRS Le journal 2018-07-26 2018-07-26T01:14:47Z European Ruling Could Slow Africa’s Push For Crispr Crops | WIRED Bilan de 15 ans de réflexion sur la gestion des données numériques | Les petites cases 2018-07-18T09:22:09Z 2018-07-18 l’interopérabilité n’est finalement pas une préoccupation des organisations 2018-07-12T00:14:08Z World Cup 2018: Kylian Mbappé and France Troll Their Way to the Final | The New Yorker Reason is a lie. Talent makes its own rules. France is in the World Cup final. Vive l’anarchie! 2018-07-12 - Neural Nets : Basics - Introduction to multi-layered neural network - Optimization via back-propagation - Regularization and Dropout - The vanishing gradient issue - Advanced Architectures with NLP applications - n-gram language model - Neural Machine Translation (Overview) - Character based model for sequence tagging Deep learning : background and application to natural language processing 2018-07-07 2018-07-07T14:36:25Z Understanding promises in Javascript – Hacker Noon 2018-07-22T11:45:44Z 2018-07-22 > Two themes were most prominent for me at #ACL2018: > 1. Understanding representations. > 2. Evaluating models in more challenging settings. > Deep Learning has not changed our understanding of language. Its main contribution in this regard is to demonstrate that a neural network aka a computational model can perform certain NLP tasks, which shows that these tasks are not indicators of intelligence" 2018-07-26 2018-07-26T16:49:55Z ACL 2018 Highlights: Understanding Representations and Evaluation in More Challenging Settings - AYLIEN 2018-07-22 2018-07-22T11:53:49Z How it feels to learn JavaScript in 2016 – Hacker Noon javascript fatigue > In order to deal with the issue of the expensive computation of the softmax, Word2Vec uses a technique called noise-contrastive estimation... **The basic idea is to convert a multinomial classification problem (as it is the problem of predicting the next word) to a binary classification problem.** How sampling works in Word2vec? Can someone please make me understand NCE and negative sampling? - Cross Validated 2018-07-07 2018-07-07T15:02:59Z Slides motivating true multitask learning in AI and NLP 2018-07-25T13:10:51Z 2018-07-25 What is Noise Contrastive estimation (NCE)? - Quora 2018-07-21T09:50:08Z 2018-07-21 While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments. 2019-01-22T18:47:12Z 2018-07-10T16:52:11Z Yazhe Li > a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful [autoregressive models](/tag/autoregressive_model). We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using [negative sampling](/tag/negative_sampling). a contrastive method that can be applied to any form of data that can be expressed in an ordered sequence: text, speech, video... Aaron van den Oord Oriol Vinyals 2018-07-21T10:05:02Z Aaron van den Oord [1807.03748] Representation Learning with Contrastive Predictive Coding 2018-07-21 1807.03748 Representation Learning with Contrastive Predictive Coding Survival of the Richest 2018-07-07T09:54:50Z 2018-07-07 2018-07-09T18:29:04Z 2018-07-09 SIGIR 2018 Tutorial - Knowledge Extraction and Inference from Text: Shallow, Deep, and Everything in Between 2018-07-09 2018-07-09T09:09:47Z Révolutions quantiques - CEA 2018-07-23 2018-07-23T12:52:24Z IJCAI Session Notes: Learning Common Sense · Liza [ex montrant un tag semanlink](http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fwww.semanlink.net%2Ftag%2Fnlp) Q&D RDF Browser 2018-07-10T18:48:22Z 2018-07-10 Practical guide to text classification  |  Google Developers 2018-07-23T22:01:01Z 2018-07-23 F. Chollet: "An important insight is that the ratio between number of training samples and mean number of words per sample can tell you whether you should be using a n-gram model or a sequence model -- and whether you should use pre-trained word embeddings or train your own from scratch." Pretrained word embeddings have a major limitation: they only incorporate previous knowledge in the first layer of the model---the rest of the network still needs to be trained from scratch > The long reign of word vectors as NLP’s core representation technique has seen an exciting new line of challengers emerge: ELMo, ULMFiT, and the OpenAI transformer. These works made headlines by demonstrating that pretrained language models can be used to achieve state-of-the-art results on a wide range of NLP tasks. > it only seems to be a question of time until pretrained word embeddings will be dethroned and replaced by pretrained language models in the toolbox of every NLP practitioner. This will likely open many new applications for NLP in settings with limited amounts of labeled data. 2018-07-09 2018-07-09T17:13:24Z NLP's ImageNet moment has arrived New findings are fueling an old suspicion that fundamental particles and forces spring from strange eight-part numbers called “octonions.” 2018-07-20 2018-07-20T23:25:59Z The Octonion Math That Could Underpin Physics | Quanta Magazine DjLu - The simple and free tool to organize your research papers 2018-07-24T23:35:15Z 2018-07-24 A Named Entity Recognition Shootout for German (2018) BiLSTM outperforms the CRF when large datasets are available and performs inferior for the smallest dataset 2018-07-12T08:43:49Z 2018-07-12 2018-07-27 2018-07-27T12:18:28Z HyperE: Hyperbolic Embeddings for Entities hyperbolic entity embeddings for 100 Wikidata relationships 2018-07-12 2018-07-12T23:23:57Z Redis 2018-07-11 2018-07-11T13:39:42Z Journee:TAL | PFIA 2018 The Biggest Digital Heist in History Isn’t Over Yet - Bloomberg 2018-07-01T23:52:37Z 2018-07-01 2018-07-07 2018-07-07T15:04:54Z What is Candidate Sampling