]> 2022-11-04T08:42:22Z 2022-11-04 Pretrained Transformer Language Models for Search | Vespa Blog « Ce n’était que la peste », de Ludmila Oulitskaïa : le passé soviétique, toujours virulent 2022-11-20 2022-11-20T11:27:03Z Nucléaire : l’interminable saga de l’EPR finlandais Olkiluoto 3 2022-11-05 2022-11-05T17:36:12Z 2022-11-13 2022-11-13T10:29:26Z > Researchers study how humans and AI can write together by designing large interaction datasets. Meet CoAuthor, an Experiment in Human-AI Collaborative Writing > Given a set P of documents of a particular class (called positive class) and a set U of unlabeled documents that contains documents from class P and also other types of documents (called negative class documents), we want to build a classifier to classify the documents in U into documents from P and documents not from P. The key feature of this problem is that there is no labeled negative document Learning to classify texts using positive and unlabeled data(2003) 2022-11-23 2022-11-23T17:02:34Z The State of Multilingual AI 2022-11-14T23:04:45Z 2022-11-14 Document AI: LiLT a better language agnostic LayoutLM model 2022-11-22 2022-11-22T21:02:19Z 2022-11-06T12:02:39Z KGF22: Knowledge Graphs and The Not So Quiet Cognitive Revolution - Ontotext 2022-11-06 Andrej Karpathy sur Twitter : "Is it the number of examples that matters or the number of presentations to the model during training?..." 2022-11-17T11:45:53Z 2022-11-17 > More generally a few remarkable strategies people use during their training: > 1) skim text because they already know it > 2) ignore text because it's clearly noise (e.g. they won't memorize SHA256 hashes. LLMs will.) > 3) ... > Compte tenu des coûts de construction et de financement, produire de l’électricité avec une centrale à charbon revient aujourd’hui à 22 euros par mégawattheure (MWh), contre 35 euros pour le photovoltaïque, 50 euros pour l’éolien et plus de 60 euros pour un réacteur nucléaire de type EPR. 2022-11-09 2022-11-09T13:31:06Z Transition écologique : « Les dix ans qui sont devant nous vont être durs », alerte l’économiste Jean Pisani-Ferry 2022-11-29T13:28:23Z 2022-11-29 En France, le nucléaire produit des déchets qui resteront dangereux pendant des millénaires Les mégabassines, symbole d’un agrobusiness intenable ou réponse adaptée aux sécheresses ? 2022-11-03 2022-11-03T20:06:32Z One of the Biggest Problems in Biology Has Finally Been Solved - Scientific American Google DeepMind CEO Demis Hassabis explains how its AlphaFold AI program predicted the 3-D structure of every known protein 2022-11-01 2022-11-01T09:45:44Z 2022-11-07T13:16:35Z 2022-11-07 Eric Jang sur Twitter : "why transformers need certain optimization tricks that aren't needed by other architectures" 2022-11-13 How to fine-tune PLMs to read a sentence and generate the corresponding full set of semantic annotations that are compliant with the terminology of a KG? > we propose a framework able to convert text into a set of Wikidata statements 2022-11-13T10:48:17Z 2022-10-25T12:12:36Z KnowGL: Knowledge Generation and Linking from Text Faisal Chowdhury [2210.13952] KnowGL: Knowledge Generation and Linking from Text Alfio Gliozzo Gaetano Rossiello Nandana Mihindukulasooriya Gaetano Rossiello Owen Cornec 2022-11-10T01:29:17Z We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions compliant with the TBox of a given Knowledge Graph (KG), such as Wikidata. We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART. Given a sentence, we fine-tune such models to detect pairs of entity mentions and jointly generate a set of facts consisting of the full set of semantic annotations for a KG, such as entity labels, entity types, and their relationships. To showcase the capabilities of our tool, we build a web application consisting of a set of UI widgets that help users to navigate through the semantic data extracted from a given input text. We make the KnowGL model available at https://huggingface.co/ibm/knowgl-large. 2210.13952 2022-11-14 2022-11-14T17:39:37Z Martin Görner sur Twitter : "how to make sure a "cow vs. camel" classifier does not end up being a "grass vs. sand" classifier" > MaskTune. A method that forces a model to explore new features by masking previously discovered features and finetuning the model over the masked data 2022-11-01 Graph representation learning in biomedicine and healthcare | Nature Biomedical Engineering 2022-11-01T09:35:40Z methods and opportunities that build models from data using structure, geometry & knowledge 2022-11-19T17:57:51Z COP27 : « Le dernier livre d’Erik Orsenna, “La Terre a soif”, est à adresser d’urgence aux ministres » 2022-11-19 Text classification by labeling words | Proceedings of the 19th national conference on Artifical intelligence (2004) 2022-11-08T18:37:01Z 2022-11-08 2022-11-18T17:17:29Z 2022-11-18 Corinne Lepage, ancienne ministre de l’environnement : « Le nucléaire est l’une des énergies les plus coûteuses, et elle nous rend dépendants de la Russie » 2022-11-01 2022-11-01T10:05:06Z Les univers parallèles à Facebook, Twitter, Instragram… ou pourquoi et comment rejoindre la fediverse – Blog-notes Les « coûts cachés » des pesticides s’élèveraient de 370 millions à plusieurs milliards d’euros par an pour la France 2022-11-23T22:28:23Z 2022-11-23 Frédéric de Hohenstaufen 1194-250.Petit fils de Barberousse. Régna sur le Saint-Empire de 1215 à 1250. Roi de Sicile, roi de Provence-Bourgogne (ou d'Arles), et roi de Jérusalem. Excommunié par deux fois. Il parlait six langues : le latin, le grec, le sicilien, l'arabe, le normand et l'allemand. Il accueillait des savants du monde entier à sa cour, portait un grand intérêt aux mathématiques et aux beaux-arts, se livrait à des expériences scientifiques (parfois sur des êtres vivants). En raison de ses bonnes relations avec le monde musulman, il mena à bien la sixième croisade — la seule croisade pacifique — et fut le second à reconquérir les lieux saints de la chrétienté, après Godefroy de Bouillon. De ses contemporains, il reçutle surnom de « Stupeur du monde » 2022-11-05 2022-11-05T15:05:06Z Frédéric II (empereur du Saint-Empire) — Wikipédia Whisper 2022-11-17 2022-11-17T11:25:28Z 2022-11-17T17:51:02Z 2022-11-17 Tony Rinaudo, l’agronome qui fait repousser les arbres du Sahel Les plantes alimentaires en France au fil de l'histoire 2022-11-26T16:51:48Z 2022-11-26 2022-11-22 2022-11-22T20:58:37Z Accelerating Document AI 2022-11-07T09:14:57Z 2022-11-07 Bonaventure Dossou sur Twitter : "We open-sourced the code and datasets of our recent #EMNLP22 paper `AfroLM`" 2022-11-27T00:48:06Z Talking to Models: Stanford U & Microsoft Method Enables Developers to Correct Model Bugs via Natural Language Patches | Synced 2022-11-27 While current methods for fixing bugs in language models typically rely on brittle patches or much data for finetuning, a novel approach uses declarative statements REPORTAGE. Dans les Deux-Sèvres, face à la sécheresse, stocker de l'eau dans des "méga-bassines" ne coule pas de source 2022-11-01 2022-11-01T10:58:16Z 2022-11-12T12:58:26Z 2022-11-12 FTX, la faillite qui ébranle les cryptomonnaies Shubham Saboo sur Twitter : "Build a Google-like search for your data in 30 mins..." (using LLMs) 2022-11-07T08:59:12Z 2022-11-07 Marcel Fröhlich sur Twitter : what enterprise knowledge graphs are about..." 2022-11-15 2022-11-15T23:53:32Z Shikhar Murty 2022-11-20 2022-11-07T05:49:19Z Marco Tulio Ribeiro [2211.03318] Fixing Model Bugs with Natural Language Patches Fixing Model Bugs with Natural Language Patches Christopher D. Manning 2211.03318 > How can users fix "bugs" in trained classifiers post-hoc without finetuning on additional data? In our EMNLP 2022 paper, we show that corrective feedback expressed as a *library of conditional natural language statements* are a promising direction. Shikhar Murty Current approaches for fixing systematic problems in NLP models (e.g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches -- declarative statements that allow developers to provide corrective feedback at the right level of abstraction, either overriding the model (``if a review gives 2 stars, the sentiment is negative'') or providing additional information the model may lack (``if something is described as the bomb, then it is good''). We model the task of determining if a patch applies separately from the task of integrating patch information, and show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data -- 1 to 7 patches improve accuracy by ~1-4 accuracy points on different slices of a sentiment analysis dataset, and F1 by 7 points on a relation extraction dataset. Finally, we show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches. 2022-11-20T10:58:16Z 2022-11-07T05:49:19Z Scott Lundberg Alex sur Twitter : "Semantic search: how you can leverage both @CohereAI and @pinecone libraries to quickly build a POC..." 2022-11-08 2022-11-08T23:07:52Z Florence Arthaud — Wikipédia 2022-11-11T13:48:43Z 2022-11-11 > Sentence-BERT has been optimized… well, for sentences! It’s reasonable to suspect that SBERT’s representations of single words or short phrases like “Business” or “Science & Technology” won’t be as semantically relevant as representations derived from a word-level method, like word2vec or GloVe 2022-11-24T14:16:39Z 2022-11-24 Few-Shot Text Classification (Cloudera 2020) L'énigme du "soldat inconnu vivant" 2022-11-12 2022-11-12T16:39:23Z Roger Wattenhofer [2210.16637] Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations 2022-11-25 Zhao Meng Recent work has demonstrated that pre-trained language models (PLMs) are zero-shot learners. However, most existing zero-shot methods involve heavy human engineering or complicated self-training pipelines, hindering their application to new situations. In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Specifically, we fit the unlabeled texts with a Bayesian Gaussian Mixture Model after initializing cluster positions and shapes using class names. Despite its simplicity, this approach achieves superior or comparable performance on both topic and sentiment classification datasets and outperforms prior works significantly on unbalanced datasets. We further explore the applicability of our clustering approach by evaluating it on 14 datasets with more diverse topics, text lengths, and numbers of classes. Our approach achieves an average of 20% absolute improvement over prompt-based zero-shot learning. Finally, we compare different PLM embedding spaces and find that texts are well-clustered by topics even if the PLM is not explicitly pre-trained to generate meaningful sentence embeddings. This work indicates that PLM embeddings can categorize texts without task-specific fine-tuning, thus providing a new way to analyze and utilize their knowledge and zero-shot learning ability. 2210.16637 Yu Fei > In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Ping Nie 2022-11-25T11:44:39Z 2022-11-23T09:47:51Z Yu Fei Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations Mrinmaya Sachan 2022-10-29T16:01:51Z The spelled-out intro to neural networks and backpropagation: building micrograd - YouTube 2022-11-07T16:56:00Z 2022-11-07 2022-11-25 2022-11-25T01:42:05Z Labelling Data Using Snorkel - KDnuggets (2020) 2022-11-06T18:05:06Z 2022-11-06 Le mystère des rivières volantes d'Amazonie | ARTE Du rôle des arbres dans le cycle hydrologique. > pourquoi ne trouve-t-on pas de déserts sur le continent sud-américain [à l'est des Andes, latitude de São Paulo], une exception à ces latitudes ? Et pourquoi les vents [alizés] venus de l'hémisphère Nord parviennent-ils à traverser l'Équateur, qui sur le reste du globe agit comme un mur infranchissable ? Pour y répondre, le professeur Nobre s'est intéressé à la théorie récente de la "pompe biotique", selon laquelle les forêts, en créant une pression atmosphérique faible, déplacent l'air humide à l'intérieur des terres et aident à générer des précipitations. cf. [A controversial Russian theory claims forests don't just make rain—they make wind](doc:2022/11/a_controversial_russian_theory_) 2022-11-06T18:01:20Z 2022-11-06 A controversial Russian theory claims forests don't just make rain—they make wind