]> Andrew Carr sur Twitter : "Writing clearly is not easy because knowledge forms a graph and papers are linear narratives..." 2022-08-16 2022-08-16T23:58:14Z 2022-08-16 How to represent part-whole hierarchies in a neural network 2022-08-16T17:02:47Z Geoffrey Hinton [2102.12627] How to represent part-whole hierarchies in a neural network 2021-02-25T01:51:22Z 2102.12627 2021-02-25T01:51:22Z This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language Geoffrey Hinton 2022-08-27 2022-08-27T16:21:50Z The 30th anniversary of Blood Meridian > An artificial intelligence system trained on words and sentences alone will never approximate human understanding. 2022-08-28T18:51:59Z 2022-08-28 AI And The Limits Of Language 2022-08-26T15:41:25Z 2022-08-26 Materials for ACL-2022 tutorial: Knowledge-Augmented Methods for NLP 2022-08-04T14:22:11Z 2022-08-04 Les nouvelles frontières du vivant | CNRS Le journal 2022-08-02T13:48:38Z Yin Zhang Feng-Lin Li 2022-08-01T06:43:19Z Guohai Xu [2208.00635] DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning 2208.00635 Ji Zhang DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning 2022-08-01T06:43:19Z 2022-08-02 Qianglong Chen Ming Yan Qianglong Chen Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many efforts made for injecting knowledge into PLMs, this problem remains open. To address the challenge, we propose \textbf{DictBERT}, a novel approach that enhances PLMs with dictionary knowledge which is easier to acquire than knowledge graph (KG). During pre-training, we present two novel pre-training tasks to inject dictionary knowledge into PLMs via contrastive learning: \textit{dictionary entry prediction} and \textit{entry description discrimination}. In fine-tuning, we use the pre-trained DictBERT as a plugin knowledge base (KB) to retrieve implicit knowledge for identified entries in an input sequence, and infuse the retrieved knowledge into the input to enhance its representation via a novel extra-hop attention mechanism. We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE. Experimental results demonstrate that our model can significantly improve typical PLMs: it gains a substantial improvement of 0.5\%, 2.9\%, 9.0\%, 7.1\% and 3.3\% on BERT-large respectively, and is also effective on RoBERTa-large. 2022-08-27T15:49:35Z 2022-08-27 What You Do Out Here, When You’re Alone | The New Yorker 2022-08-28T22:50:30Z 2022-08-28 Transformers from Scratch 2022-08-25T03:51:39Z Mengnan Du [2208.11857] Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey 2022-08-27 Mengnan Du Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey Xia Hu 2208.11857 2022-08-25T03:51:39Z Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly hurt their Out-of-Distribution (OOD) generalization and adversarial robustness. In this paper, we provide a review of recent developments that address the robustness challenge of LLMs. We first introduce the concepts and robustness challenge of LLMs. We then introduce methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we identify key challenges and introduce the connections of this line of research to other directions. Fengxiang He Na Zou 2022-08-27T10:39:46Z Dacheng Tao Nyströmformer: Approximating self-attention in linear time and memory via the Nyström method 2022-08-03 2022-08-03T00:55:18Z Train and Fine-Tune Sentence Transformers Models 2022-08-13T09:49:57Z 2022-08-13 2022-08-19 2022-08-19T11:41:46Z Dinghan Shen cité par [[2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers](doc:2022/01/2004_07180_specter_document_) Lawrence Carin [1805.09906] Diffusion Maps for Textual Network Embedding Diffusion Maps for Textual Network Embedding Yitong Li Xinyuan Zhang 1805.09906 Xinyuan Zhang 2019-01-14T16:42:26Z Textual network embedding leverages rich text information associated with the network to learn low-dimensional vectorial representations of vertices. Rather than using typical natural language processing (NLP) approaches, recent research exploits the relationship of texts on the same edge to graphically embed text. However, these models neglect to measure the complete level of connectivity between any two texts in the graph. We present diffusion maps for textual network embedding (DMTE), integrating global structural information of the graph to capture the semantic relatedness between texts, with a diffusion-convolution operation applied on the text inputs. In addition, a new objective function is designed to efficiently preserve the high-order proximity using the graph diffusion. Experimental results show that the proposed approach outperforms state-of-the-art methods on the vertex-classification and link-prediction tasks. 2018-05-24T21:24:14Z 2022-08-23 2022-08-23T21:55:53Z François Scharffe sur Twitter : "Exciting developments for query answering over graph neural networks" / Twitter 2022-08-28T10:48:20Z 2022-08-28 Sayak Paul sur Twitter : "If you dig these kinds of performance benefits for text preprocessing..." 2022-08-18T19:45:44Z 2022-08-18 Rajiv Shah sur Twitter : "How to explain predictions from 🤗 transformer models?..." 2022-08-30 Brésil : mort de l’« Indien du trou », dernier survivant du peuple indigène Tanaru 2022-08-30T10:26:42Z 2022-08-20 Karl Higley sur Twitter : "Many ANN search tools (e.g. FAISS, ScaNN) allow you to provide multiple points as part of the same query..." 2022-08-20T18:11:10Z > Puzzled why more retrieval models don’t take advantage of this. Give me 100 neighbors of ten points, not 1000 neighbors of one point! (Then score and order them.) BlenderBot 3: A 175B parameter, publicly available chatbot that improves its skills and safety over time 2022-08-09 2022-08-09T09:56:36Z [[2006.05987] Revisiting Few-sample BERT Fine-tuning](doc:2022/03/2006_05987_revisiting_few_sam) On Stability of Few-Sample Transformer Fine-Tuning | Kaggle 2022-08-29T01:13:58Z 2022-08-29 Anna Bosman Heinrich van Deventer 2022-08-10T15:10:35Z 2022-08-28T10:22:54Z [2208.05388] ATLAS: Universal Function Approximator for Memory Retention Artificial neural networks (ANNs), despite their universal function approximation capability and practical success, are subject to catastrophic forgetting. Catastrophic forgetting refers to the abrupt unlearning of a previous task when a new task is learned. It is an emergent phenomenon that hinders continual learning. Existing universal function approximation theorems for ANNs guarantee function approximation ability, but do not predict catastrophic forgetting. This paper presents a novel universal approximation theorem for multi-variable functions using only single-variable functions and exponential functions. Furthermore, we present ATLAS: a novel ANN architecture based on the new theorem. It is shown that ATLAS is a universal function approximator capable of some memory retention, and continual learning. The memory of ATLAS is imperfect, with some off-target effects during continual learning, but it is well-behaved and predictable. An efficient implementation of ATLAS is provided. Experiments are conducted to evaluate both the function approximation and memory retention capabilities of ATLAS. 2208.05388 Heinrich van Deventer 2022-08-10T15:10:35Z ATLAS: Universal Function Approximator for Memory Retention 2022-08-28 Guerre de Canudos 2022-08-17 2022-08-17T17:45:42Z [Guerre de Canudos](doc:2022/08/guerre_de_canudos) 2022-08-17 2022-08-17T17:47:31Z La Guerre de la fin du monde [No Language Left Behind](doc:2022/07/no_language_left_behind) AllenNLP sur Twitter : "Dataset: training data for @MetaAI 's No Language Left Behind NLLB-200 models!..." 2022-08-25 2022-08-25T21:26:55Z Les trahisons intimes des géants du numérique 2022-08-25 2022-08-25T21:12:36Z 2022-08-16T14:58:46Z 2022-08-16 This weekend, I watched a hacker jailbreak a John Deere tractor live on stage - Pluralistic: 15 Aug 2022 – Cory Doctorow 2208.03299 Gautier Izacard Lucas Hosseini [2208.03299] Few-shot Learning with Retrieval Augmented Language Model Maria Lomeli 2022-08-08T11:32:33Z Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42\% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters. Gautier Izacard Patrick Lewis 2022-08-05T17:39:22Z Sebastian Riedel Edouard Grave Fabio Petroni 2022-08-05T17:39:22Z Timo Schick Few-shot Learning with Retrieval Augmented Language Model Armand Joulin > Atlas, a retrieval-augmented language model capable of strong few-shot learning, despite having lower parameter counts than other powerful recent few-shot learners. [tweet](https://twitter.com/davisblalock/status/1564148889996836864?s=20&t=BnLM_O1HkTp7qJILF0DW8g) Jane Yu 2022-08-08 Beasts Clawing at Straws 2022-08-14 2022-08-14T22:52:13Z Titre français : Lucky Strike Nihal V. Nayak 2022-08-29T15:42:01Z Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision. 2022-08-29 Zero-Shot Learning with Common Sense Knowledge Graphs Nihal V. Nayak [2006.10713] Zero-Shot Learning with Common Sense Knowledge Graphs Stephen H. Bach 2020-06-18T17:46:17Z 2006.10713 2022-08-25T19:27:00Z 2022-08-21T21:38:24Z 2022-08-21 [Tweet](https://twitter.com/Rainmaker1973/status/1561413733003497480?s=20&t=x3uu9Az0aIhBZjRdUihAhA) Insect zombies! Mind-altered crickets, ants, beetles, flies, and cicadas — Bug of the Week 2022-08-26 2022-08-26T12:55:49Z A la centrale nucléaire de Zaporijia, « le scénario d’un accident ne peut être exclu » : entretien avec le patron de l’AIEA [wikipedia](https://en.wikipedia.org/wiki/Kushim_(Uruk_period)) 2022-08-21T02:12:58Z 2022-08-21 Who’s the First Person in History Whose Name We Know? 2022-08-10T01:35:25Z SwiftOnSecurity sur Twitter : "PSA: If an iPhone is stolen, DO NOT REMOVE IT FROM YOUR ACCOUNT. Some underworld cretin may make empty threats about data they don't have, but it's to TRICK YOU to MARK THE PHONE UNUSED SO THEY CAN SELL IT. Do not respond. They will sell it for parts instead. They're powerless. https://t.co/7K7XC0aHkP" / Twitter 2022-08-10 Maria Servant (Research Gate) 2022-08-26 2022-08-26T17:50:19Z 2022-08-04T18:43:56Z Un nouveau fléau, la mouche des fruits, menace la mangue sénégalaise 2022-08-04 2022-08-26T01:36:22Z 2022-08-26 Massimo sur Twitter : "Of the many structures Leonardo da Vinci designed..." Sebastian Riedel Gautier Izacard 2020-12-30T13:46:06Z 2022-08-08T13:48:04Z Nicola De Cao Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks. While very effective, this approach is also memory intensive, as the dense vectors for the whole knowledge source need to be kept in memory. In this paper, we study how the memory footprint of dense retriever-reader systems can be reduced. We consider three strategies to reduce the index size: dimension reduction, vector quantization and passage filtering. We evaluate our approach on two question answering benchmarks: TriviaQA and NaturalQuestions, showing that it is possible to get competitive systems using less than 6Gb of memory. A Memory Efficient Baseline for Open Domain Question Answering [2012.15156] A Memory Efficient Baseline for Open Domain Question Answering Lucas Hosseini Fabio Petroni 2020-12-30T13:46:06Z Edouard Grave 2012.15156 2022-08-08 Gautier Izacard 2022-08-16T22:36:20Z 2022-08-16 > With Large Language Models, we only need a few examples to train a Classifier. What makes a good example? Find out here. What Makes a Good Classification Example? Unsupervised Learning — Sentence-Transformers documentation 2022-08-20 > In our paper TSDAE we compare approaches for sentence embedding tasks, and in GPL we compare them for semantic search tasks (given a query, find relevant passages). While the unsupervised approach achieve acceptable performances for sentence embedding tasks, they perform poorly for semantic search tasks. 2022-08-20T01:16:16Z Anthropic sur Twitter : "We examine which safety techniques for LMs are more robust to human-written, adversarial inputs ..." 2022-08-25 2022-08-25T18:31:06Z 2022-08-11 L’écologie, une terre de conflits 2022-08-11T00:47:06Z Ev Fedorenko sur Twitter : "a thread on language and thought " 2022-08-03 2022-08-03T11:11:49Z 2022-08-03 Corinne Lepage : « Le pari sur le nucléaire risque de nous exposer à une pénurie d’électricité » 2022-08-03T11:07:47Z Gal Shachaf Amir Globerson Omer Levy [2112.07708] Learning to Retrieve Passages without Supervision Jonathan Berant Ori Ram 2022-08-28 2021-12-14T19:18:08Z 2022-08-28T10:35:36Z Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. In this work we ask whether this dependence on labeled data can be reduced via unsupervised pretraining that is geared towards ODQA. We show this is in fact possible, via a novel pretraining scheme designed for retrieval. Our "recurring span retrieval" approach uses recurring spans across passages in a document to create pseudo examples for contrastive learning. Our pretraining scheme directly controls for term overlap across pseudo queries and relevant passages, thus allowing to model both lexical and semantic relations between them. The resulting model, named Spider, performs surprisingly well without any labeled training examples on a wide range of ODQA datasets. Specifically, it significantly outperforms all other pretrained baselines in a zero-shot setting, and is competitive with BM25, a strong sparse baseline. Moreover, a hybrid retriever over Spider and BM25 improves over both, and is often competitive with DPR models, which are trained on tens of thousands of examples. Last, notable gains are observed when using Spider as an initialization for supervised training. Ori Ram Learning to Retrieve Passages without Supervision 2022-05-17T07:54:42Z 2112.07708 Dom Casmurro 2022-08-17 2022-08-17T17:55:03Z 2022-08-06T11:41:57Z 2022-08-06 PyKEEN, Python package for reproducible, facile knowledge graph embeddings. Tulika Saha Qianqian Xie Recently, neural topic models (NTMs) have been incorporated into pre-trained language models (PLMs), to capture the global semantic information for text summarization. However, in these methods, there remain limitations in the way they capture and integrate the global semantic information. In this paper, we propose a novel model, the graph contrastive topic enhanced language model (GRETEL), that incorporates the graph contrastive topic model with the pre-trained language model, to fully leverage both the global and local contextual semantics for long document extractive summarization. To better capture and incorporate the global semantic information into PLMs, the graph contrastive topic model integrates the hierarchical transformer encoder and the graph contrastive learning to fuse the semantic information from the global document context and the gold summary. To this end, GRETEL encourages the model to efficiently extract salient sentences that are topically related to the gold summary, rather than redundant sentences that cover sub-optimal topics. Experimental results on both general domain and biomedical datasets demonstrate that our proposed method outperforms SOTA methods. 2022-08-21T23:09:29Z GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization 2022-08-24T08:13:17Z [2208.09982] GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization 2208.09982 Jimin Huang 2022-08-24 Sophia Ananiadou Qianqian Xie 2022-08-21T23:09:29Z Dongyang Ma Xinting Huang In this technical report, we introduce Effidit (Efficient and Intelligent Editing), a digital writing assistant that facilitates users to write higher-quality text more efficiently by using artificial intelligence (AI) technologies. Previous writing assistants typically provide the function of error checking (to detect and correct spelling and grammatical errors) and limited text-rewriting functionality. With the emergence of large-scale neural language models, some systems support automatically completing a sentence or a paragraph. In Effidit, we significantly expand the capacities of a writing assistant by providing functions in five categories: text completion, error checking, text polishing, keywords to sentences (K2S), and cloud input methods (cloud IME). In the text completion category, Effidit supports generation-based sentence completion, retrieval-based sentence completion, and phrase completion. In contrast, many other writing assistants so far only provide one or two of the three functions. For text polishing, we have three functions: (context-aware) phrase polishing, sentence paraphrasing, and sentence expansion, whereas many other writing assistants often support one or two functions in this category. The main contents of this report include major modules of Effidit, methods for implementing these modules, and evaluation results of some key methods. Enbo Zhao Haiyun Jiang Cong Zhou 2022-08-04T12:13:43Z 2022-08-06 [2208.01815] Effidit: Your AI Writing Assistant 2208.01815 2022-08-03T02:24:45Z Effidit: Your AI Writing Assistant Piji Li Shuming Shi Shuming Shi Duyu Tang Yan Wang Yong Dai 2022-08-06T15:15:42Z Leyang Cui Guoping Huang Wei Bi Chapelle Notre-Dame-des-Fontaines de La Brigue 2022-08-14 2022-08-14T22:54:57Z 2022-08-11 Connor Shorten sur Twitter : "Wow, incredible to see this from NASA! 🚀 Weaviate's integration of Vector Search with Graph Data and Symbolic Relations is a very interesting combination of technologies!..." 2022-08-11T00:58:58Z Uncommon Uses of Python in Commonly Used Libraries 2022-08-04 2022-08-04T10:05:51Z 2022-08-29T14:54:48Z 2022-08-29 Santiago sur Twitter : "Here is a confusion matrix..." Au Rwanda, le succès trop éclatant de la protection des gorilles 2022-08-18 2022-08-18T00:58:09Z 2022-08-05T14:20:23Z 2022-08-05 Elicit: The AI Research Assistant > Elicit uses language models to help you automate research workflows, like parts of literature review. Elicit can find relevant papers without perfect keyword match, summarize takeaways from the paper specific to your question, and extract key information from the papers. 2019-04-09T04:09:45Z Angli Liu Knowledge Augmented Language Model (KALM) a language model with access to information available in a KB, no assumptions about the availability of additional components (such as Named Entity Taggers) or annotations > While classes of named entities (e.g., person or location) occur frequently, each individual name (e.g, Atherton or Zhouzhuang) may be observed infrequently even in a very large corpus of text. As a result language models learn to represent accurately only the most popular named entities > knowing that Alice is a name used to refer to a person should give ample information about the context in which the word may occur (e.g., Bob visited Alice). > --- > extends a traditional **RNN LM** > we enhance a traditional LM with a gating mechanism that controls whether a particular word is modeled as a general word or as a reference to an entity > > We train the model end-to-end with only the traditional predictive language modeling perplexity objective > > KALM is trained end-to-end using a predictive objective on large corpus of text. > To the best of our knowledge, KALM is the first unsupervised neural NER approach. > KALM extends a traditional, RNN-based neural LM. Jingfei Du 1904.04458 2019-06-24T07:48:21Z Traditional language models are unable to efficiently model entity names observed in text. All but the most popular named entities appear infrequently in text providing insufficient context. Recent efforts have recognized that context can be generalized between entity names that share the same type (e.g., \emph{person} or \emph{location}) and have equipped language models with access to an external knowledge base (KB). Our Knowledge-Augmented Language Model (KALM) continues this line of work by augmenting a traditional model with a KB. Unlike previous methods, however, we train with an end-to-end predictive objective optimizing the perplexity of text. We do not require any additional information such as named entity tags. In addition to improving language modeling performance, KALM learns to recognize named entities in an entirely unsupervised way by using entity type information latent in the model. On a Named Entity Recognition (NER) task, KALM achieves performance comparable with state-of-the-art supervised models. Our work demonstrates that named entities (and possibly other types of world knowledge) can be modeled successfully using predictive learning and training on large corpora of text without any additional information. 2022-08-31T01:08:36Z Angli Liu Veselin Stoyanov 2022-08-31 Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition [1904.04458] Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition GitHub - raphaelsty/kgsearch: Query and visualize knowledge graphs 2022-08-13 2022-08-13T00:17:22Z 2022-08-25T18:43:09Z [2208.11663] PEER: A Collaborative Language Model Jane Dwivedi-Yu 2208.11663 Fabio Petroni Edouard Grave 2022-08-24T16:56:47Z 2022-08-26 Zhengbao Jiang 2022-08-24T16:56:47Z Christoforos Nalmpantis Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and incapable of verbally planning or explaining their actions. To address these shortcomings, we introduce PEER, a collaborative language model that is trained to imitate the entire writing process itself: PEER can write drafts, add suggestions, propose edits and provide explanations for its actions. Crucially, we train multiple instances of PEER able to infill various parts of the writing process, enabling the use of self-training techniques for increasing the quality, amount and diversity of training data. This unlocks PEER's full potential by making it applicable in domains for which no edit histories are available and improving its ability to follow instructions, to write useful comments, and to explain its actions. We show that PEER achieves strong performance across various domains and editing tasks. Timo Schick 2022-08-26T01:42:05Z Gautier Izacard Sebastian Riedel PEER: A Collaborative Language Model Qingfei You Patrick Lewis Timo Schick 2022-08-25 Timo Schick sur Twitter : "PEER, a language model trained to incrementally write texts & collaborate w/ humans ..." [[2208.11663] PEER: A Collaborative Language Model](doc:2022/08/2208_11663_peer_a_collaborat) 2022-08-26T12:46:03Z 2022-08-26 La Nigérienne Mariam Kamara, étoile montante de l’architecture en Afrique The Computer Scientist Trying to Teach AI to Learn Like We Do | Quanta Magazine 2022-08-04 2022-08-04T10:16:04Z 2022-08-11T00:50:17Z load_in_8bit=True Available on the main branch of Transformers 2022-08-11 Sylvain Gugger sur Twitter : "Load any HuggingFace model in Int8 precision and save half the memory..."