]> 2022-05-31T16:12:04Z Le glyphosate franchit une étape-clé vers sa réautorisation en Europe 2022-05-31 > We train a translation model on 1000+ languages, using it to launch 24 new languages on Google Translate without any parallel data for these languages... Isaac R Caswell sur Twitter : "How many languages can we support with Machine Translation?..." 2022-05-18T16:12:44Z 2022-05-18 International Workshop on Knowledge Graph Generation from Text (Text2KG) 2022 2022-05-30T09:49:09Z 2022-05-30 Francesco De Toni sur Twitter : "Can we use pre-trained Large Language Models to study historical texts with no fine tuning?..." 2022-05-12T22:35:18Z 2022-05-12 Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined functions and theorems during test time. Christian Szegedy [tweet](https://twitter.com/LiamFedus/status/1522605777961119745?s=20&t=Jt9GBjNcFw6TqeqYvz_BRA): Memorizing Transformers which increases context length up to 262k by an external memory of (keys, values) for that document. - Matches quality of Transformers 5x larger - Can fine-tune a prior pre-trained models to use it > Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately 2022-03-16T19:54:35Z Yuhuai Wu 2022-03-16T19:54:35Z 2022-05-07 Memorizing Transformers Yuhuai Wu [2203.08913] Memorizing Transformers DeLesley Hutchins Markus N. Rabe 2203.08913 2022-05-07T09:01:26Z Percy Liang Ananya Kumar Tengyu Ma 2022-05-01T08:15:47Z Aditi Raghunathan 2022-02-21T09:03:34Z 2022-02-21T09:03:34Z When transferring a pretrained model to a downstream task, two popular methods are full fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer -- the "head"). It is well known that fine-tuning leads to better accuracy in-distribution (ID). However, in this paper, we find that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD) when the pretrained features are good and the distribution shift is large. On 10 distribution shift datasets (Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR $\to$ STL, CIFAR10.1, FMoW, ImageNetV2, ImageNet-R, ImageNet-A, ImageNet-Sketch), fine-tuning obtains on average 2% higher accuracy ID but 7% lower accuracy OOD than linear probing. We show theoretically that this tradeoff between ID and OOD accuracy arises even in a simple setting: fine-tuning overparameterized two-layer linear networks. We prove that the OOD error of fine-tuning is high when we initialize with a fixed or random head -- this is because while fine-tuning learns the head, the lower layers of the neural network change simultaneously and distort the pretrained features. Our analysis suggests that the easy two-step strategy of linear probing then full fine-tuning (LP-FT), sometimes used as a fine-tuning heuristic, combines the benefits of both fine-tuning and linear probing. Empirically, LP-FT outperforms both fine-tuning and linear probing on the above datasets (1% better ID, 10% better OOD than full fine-tuning). [2202.10054] Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution 2202.10054 Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution 2022-05-01 Ananya Kumar Robbie Jones Max Irwin sur Twitter : "Instant Neural Search for your website! ..." 2022-05-19 2022-05-19T19:04:59Z Zhe Dong Large language models (LLMs) have demonstrated human-level performance on a vast spectrum of natural language tasks. However, it is largely unexplored whether they can better internalize knowledge from a structured data, such as a knowledge graph, or from text. In this work, we propose a method to infuse structured knowledge into LLMs, by directly training T5 models on factual triples of knowledge graphs (KGs). We show that models pre-trained on Wikidata KG with our method outperform the T5 baselines on FreebaseQA and WikiHop, as well as the Wikidata-answerable subset of TriviaQA and NaturalQuestions. The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge. Trained on a smaller size KG, WikiMovies, we saw 3x improvement of exact match score on MetaQA task compared to T5 baseline. The proposed method has an advantage that no alignment between the knowledge graph and text corpus is required in curating training data. This makes our method particularly useful when working with industry-scale knowledge graphs. Martin Jaggi [2205.08184] SKILL: Structured Knowledge Infusion for Large Language Models SKILL: Structured Knowledge Infusion for Large Language Models Fedor Moiseev 2205.08184 2022-05-18T23:57:17Z > a method to infuse structured knowledge into LLMs, by directly training T5 models on factual triples of knowledge graphs > The models pre-trained on factual triples compare competitively with the ones on natural language sentences that contain the same knowledge. > The proposed method has an advantage that no alignment between the knowledge graph and text corpus is required Enrique Alfonseca 2022-05-17T09:12:22Z 2022-05-17T09:12:22Z Fedor Moiseev 2022-05-18 BERT-INT: A BERT-based Interaction Model For Knowledge Graph Alignment 2022-05-11T18:03:42Z 2022-05-11 In this work, we show for the first time that we can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA 2020-12-23T12:28:17Z [2012.12624] Learning Dense Representations of Phrases at Scale Mujeen Sung Jinhyuk Lee 2012.12624 Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019). However, current phrase retrieval models heavily depend on sparse representations and still underperform retriever-reader approaches. In this work, we show for the first time that we can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA. We present an effective method to learn phrase representations from the supervision of reading comprehension tasks, coupled with novel negative sampling methods. We also propose a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference. On five popular open-domain QA datasets, our model DensePhrases improves over previous phrase retrieval models by 15%-25% absolute accuracy and matches the performance of state-of-the-art retriever-reader models. Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs. Finally, we directly use our pre-indexed dense phrase representations for two slot filling tasks, showing the promise of utilizing DensePhrases as a dense knowledge base for downstream tasks. Learning Dense Representations of Phrases at Scale Jaewoo Kang Danqi Chen Jinhyuk Lee 2022-05-11T08:53:38Z 2022-05-11 2021-06-02T12:20:23Z 2022-05-30 Corinne Lepage : « Supprimons tout ce qui freine le développement des énergies renouvelables » 2022-05-30T18:15:49Z Sosuke Nishikawa Isao Echizen Sosuke Nishikawa 2022-05-09T13:22:44Z [2205.04260] EASE: Entity-Aware Contrastive Learning of Sentence Embedding 2205.04260 Yoshimasa Tsuruoka We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities. The advantage of using entity supervision is twofold: (1) entities have been shown to be a strong indicator of text semantics and thus should provide rich training signals for sentence embeddings; (2) entities are defined independently of languages and thus offer useful cross-lingual alignment supervision. We evaluate EASE against other unsupervised models both in monolingual and multilingual settings. We show that EASE exhibits competitive or better performance in English semantic textual similarity (STS) and short text clustering (STC) tasks and it significantly outperforms baseline methods in multilingual settings on a variety of tasks. Our source code, pre-trained models, and newly constructed multilingual STC dataset are available at https://github.com/studio-ousia/ease. EASE: Entity-Aware Contrastive Learning of Sentence Embedding Ikuya Yamada > we explore a type of supervision that has been under-explored in the literature: entity hyperlink annotations from Wikipedia. > > entities have been shown to be a strong indicator of text semantics > > a method for mining hard negatives based on the entity type Uses wikipedia2vec > the reliance on Wikipedia for training data may limit the application of the models to specific domains (e.g., general or encyclopedia domains). To apply EASE to other domains, one may need to annotate text from the domain either manually or automatically. Ryokan Ri 2022-05-11T01:25:12Z 2022-05-09T13:22:44Z 2022-05-11 2022-05-12 Unifying Language Learning Paradigms Huaixiu Steven Zheng [2205.05131] Unifying Language Learning Paradigms Xavier Garcia 2205.05131 Vinh Q. Tran Dara Bahri 2022-05-12T12:12:04Z 2022-05-09T22:59:51Z 2022-05-09 Le séquençage du génome humain touche (presque) à sa fin 2022-05-04T10:55:18Z 2022-05-04 Les espèces eucaryotes non cultivées du plancton révèlent enfin leurs secrets | INEE 2022-05-30T09:58:01Z 2022-05-30 KGC Workshop DataBorg - Knowledge management simplified 2022-05-14T10:22:34Z 2022-05-14 > DataBorg provides an all-in-one AI-powered platform for consumers and businesses that allows them to improve data understanding through knowledge extraction, integration and analysis. includes text -> knowledge graph conversion. 2022-05-12T09:01:55Z BERTopic: The Future of Topic Modeling | Pinecone 2022-05-12 2022-05-06 Stocamine, l’histoire sans fin de la « poubelle toxique de l’Alsace » 2022-05-06T13:47:48Z 2022-04-18T05:54:44Z Christopher Ré Mayee F. Chen 2022-05-11T08:55:13Z Megan Leszczynski 2204.08173 2022-05-11 TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval Daniel Y. Fu Megan Leszczynski a method for training entity retrievers on knowledge graph types and unstructured text > Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions > A promising approach to overcome popularity biases is to incorporate types (e.g., athlete or politician) from a knowledge graph into the retriever. A key advantage of types is that contextual cues learned over popular entities can generalize to rare entities of the same types. > Our key insight is that type information should also be learned contrastively, as opposed to more straightforward approaches like adding the type as textual input 2022-04-18T05:54:44Z Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss to encourage entities and queries of similar types to be close in the embedding space. TABi improves retrieval of rare entities on the Ambiguous Entity Retrieval (AmbER) sets, while maintaining strong overall retrieval performance on open-domain tasks in the KILT benchmark compared to state-of-the-art retrievers. TABi is also robust to incomplete type systems, improving rare entity retrieval over baselines with only 5% type coverage of the training dataset. We make our code publicly available at https://github.com/HazyResearch/tabi. [2204.08173] TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval jboynyc/textnets: Text analysis with networks. 2022-05-30 2022-05-30T18:41:40Z Textnets: a network-based approach to automated text analysis with spaCy > textnets represents collections of texts as networks of documents and words. This provides novel possibilities for the visualization and analysis of texts. Alon Albalak sur Twitter : FETA, a benchmark for task transfer 2022-05-14 2022-05-14T10:54:28Z The great African regreening: millions of 'magical' new trees bring renewal | Niger | The Guardian 2022-05-12T09:57:33Z 2022-05-12 2022-05-05T11:01:38Z 2022-05-05 Zeta Alpha sur Twitter : "📈 Our monthly report on trends of AI for May written by @SergiCastellaSa..." 2022-05-19 2022-05-19T01:05:54Z A review of some techniques for inclusion of domain‐knowledge into deep neural networks 2022-05-10T08:00:10Z Apurva Shah [2205.03983] Building Machine Translation Systems for the Next Thousand Languages Vera Axelrod Yonghui Wu 2022-05-10 Yanping Huang Macduff Hughes In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing practical MT models for under-served languages by leveraging massively multilingual models trained with supervised parallel data for over 100 high-resource languages and monolingual datasets for an additional 1000+ languages; and (iii) Studying the limitations of evaluation metrics for these languages and conducting qualitative analysis of the outputs from our MT models, highlighting several frequent error modes of these types of models. We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings. Theresa Breiner Pallavi Baljekar Maxim Krikun Alexander Gutkin 2022-05-09T00:24:13Z Wolfgang Macherey Ankur Bapna Yuan Cao Xavier Garcia Aditya Siddhant Julia Kreutzer Jason Riesa Mia Xu Chen Zhifeng Chen Mengmeng Niu Isaac Caswell 2022-05-09T00:24:13Z Daan van Esch Pidong Wang Ankur Bapna 2205.03983 Orhan Firat Klaus Macherey Building Machine Translation Systems for the Next Thousand Languages