]> Daniel Cer Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models [2108.08877] Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Noah Constant 2021-12-14T06:19:33Z Ji Ma Yinfei Yang Jianmo Ni 2023-02-17T18:20:47Z 2023-02-17 We provide the first exploration of sentence embeddings from text-to-text transformers (T5). Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 sentence embeddings: two utilize only the T5 encoder and one uses the full T5 encoder-decoder model. To support our investigation, we establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark. Our encoder-only models outperforms Sentence-BERT and SimCSE sentence embeddings on both SentEval and SentGLUE transfer tasks, including semantic textual similarity (STS). Scaling up T5 from millions to billions of parameters is found to produce consistent further improvements. Finally, our encoder-decoder method achieves a new state-of-the-art on STS when using sentence embeddings. Our models are released at https://tfhub.dev/google/collections/sentence-t5/1. Keith B. Hall Gustavo Hernández Ábrego 2021-08-19T18:58:02Z 2108.08877 Jianmo Ni « Les services publics sont écrasés par les logiques comptables de réduction des coûts » 2023-02-06 2023-02-06T20:21:21Z 2023-02-12T12:49:30Z 2023-02-12 Biodiversité : « Ni l’ampleur, ni la rapidité, ni le caractère systémique de l’écroulement des insectes n’ont été anticipés par les scientifiques » 2023-02-19 2023-02-19T12:07:58Z Amin Karbasi sur Twitter : "...Computation ofTransformer models incurs quadratic time/memory complexities in sequence length... Can we dramatically accelerate them?" Nick Sorros sur Twitter : "If you want to use LLMs like GPT3 to annotate data... recipes from @explosion_ai" 2023-02-23T22:48:01Z 2023-02-23 2023-02-27 2022-10-10T16:36:47Z We present a very simple algorithm for attention that requires $O(1)$ memory with respect to sequence length and an extension to self-attention that requires $O(\log n)$ memory. This is in contrast with the frequently stated belief that self-attention requires $O(n^2)$ memory. While the time complexity is still $O(n^2)$, device memory rather than compute capability is often the limiting factor on modern accelerators. Thus, reducing the memory requirements of attention allows processing of longer sequences than might otherwise be feasible. We provide a practical implementation for accelerators that requires $O(\sqrt{n})$ memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention. We also demonstrate how to differentiate the function while remaining memory-efficient. For sequence length 16384, the memory overhead of self-attention is reduced by 59X for inference and by 32X for differentiation. Markus N. Rabe 2023-02-27T12:58:02Z 2112.05682 Markus N. Rabe 2021-12-10T17:25:07Z Charles Staats [2112.05682] Self-attention Does Not Need O(n^2) Memory Self-attention Does Not Need $O(n^2)$ Memory 2023-02-02 deepset sur Twitter : " Generative models have taken the world of NLP by storm. But LLMs do not know about your personal data. This makes personal assistants, enterprise knowledge management and many other applications challenging. Retrieval augmented pipelines are the answer" 2023-02-02T22:47:09Z 2302.04761 2023-02-09T16:49:57Z Toolformer: Language Models Can Teach Themselves to Use Tools Roberto Dessì 2023-02-09T16:49:57Z Timo Schick Timo Schick Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities. Maria Lomeli Jane Dwivedi-Yu Roberta Raileanu 2023-02-13 Jay Hack sur Twitter : "My thoughts on Toolformer IMO the most important paper in the past few weeks..." 2023-02-13T15:16:31Z Nicola Cancedda 2023-02-13 [2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools > Toolformer, **a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction**. > fulfills the following desiderata: > - The use of tools should be learned in a self-supervised way without requiring large amounts of human annotations >- The LM should be able to decide for itself when and how to use which tool. > Approach based on the recent idea of using large LMs with incontext learning (Brown et al., 2020) to generate entire datasets from scratch. > > Given just a handful of human-written examples of how an API can be used, > - we let a LM annotate a huge language modeling dataset with potential API calls. > - We then use a self-supervised loss to determine which of these API calls actually help the model in predicting future tokens. >- Finally, we finetune the LM itself on the API calls that it considers useful. [Jay Hack @mathemagic1an sur twitter](https://twitter.com/mathemagic1an/status/1624870248221663232): > from a small seed set of human inputs (essentially demonstrating usage of APIs), the training set for this behavior is generated by the LLM itself. > > So what does this mean? We've found a promising way to tightly integrate arbitrary APIs with our best-performing models. 2023-02-13T15:18:25Z Luke Zettlemoyer Thomas Scialom Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning. 2023-02-07T16:40:38Z 2203.14465 Eric Zelikman "Self-Taught Reasoner" (STaR) > (to our knowledge) the first technique to allow a pre-trained large language model to iteratively use its language modeling capacity to improve itself > Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose **a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales**, to bootstrap the ability to perform successively more complex reasoning. [2203.14465] STaR: Bootstrapping Reasoning With Reasoning Eric Zelikman 2022-03-28T03:12:15Z Yuhuai Wu Noah D. Goodman Jesse Mu STaR: Bootstrapping Reasoning With Reasoning 2023-02-07 2022-05-20T13:52:54Z 2023-02-25T01:40:30Z 2023-02-25 Planning for AGI and beyond Dans un second « Livre noir des refoulements », des ONG alertent sur les violences contre les migrants aux frontières de l’UE (dec 2022) 2023-02-26T14:16:59Z 2023-02-26 > 1 600 témoignages, qui concernent environ 25 000 migrants, recueillis dans plus de la moitié des pays européens, y compris les Etats balkaniques. Toutes les personnes interrogées ont été refoulées, battues, humiliées, volées, maltraitées, privées d’eau ou de nourriture alors qu’elles demandaient l’asile. 2023-02-11T10:31:19Z 2023-02-11 ChatGPT Is a Blurry JPEG of the Web | The New Yorker New Yorker's article is very good, but the point is: ChatGPT is not a Knowledge Base, it is a system that masters Natural Language. 2023-02-17 Embedding Recycling: Making Language Model Development More Sustainable | AI2 Blog 2023-02-17T00:45:07Z [2302.05019] A Comprehensive Survey on Automatic Knowledge Graph Construction Jia Wu Lingfeng Zhong 2023-02-10T02:29:21Z 2023-02-15 Xindong Wu A Comprehensive Survey on Automatic Knowledge Graph Construction 2302.05019 Automatic knowledge graph construction aims to manufacture structured human knowledge. To this end, much effort has historically been spent extracting informative fact patterns from different data sources. However, more recently, research interest has shifted to acquiring conceptualized structured knowledge beyond informative data. In addition, researchers have also been exploring new ways of handling sophisticated construction tasks in diversified scenarios. Thus, there is a demand for a systematic review of paradigms to organize knowledge structures beyond data-level mentions. To meet this demand, we comprehensively survey more than 300 methods to summarize the latest developments in knowledge graph construction. A knowledge graph is built in three steps: knowledge acquisition, knowledge refinement, and knowledge evolution. The processes of knowledge acquisition are reviewed in detail, including obtaining entities with fine-grained types and their conceptual linkages to knowledge graphs; resolving coreferences; and extracting entity relationships in complex scenarios. The survey covers models for knowledge refinement, including knowledge graph completion, and knowledge fusion. Methods to handle knowledge evolution are also systematically presented, including condition knowledge acquisition, condition knowledge graph completion, and knowledge dynamic. We present the paradigms to compare the distinction among these methods along the axis of the data environment, motivation, and architecture. Additionally, we also provide briefs on accessible resources that can help readers to develop practical knowledge graph systems. The survey concludes with discussions on the challenges and possible directions for future exploration. Lingfeng Zhong Hao Peng Qian Li 2023-02-15T16:59:51Z 2023-02-10T02:29:21Z j'hallucine 2023-02-15T19:33:15Z 2023-02-15 Class Labels for Custom Datasets - 🤗Datasets - Hugging Face Forums Maarten Grootendorst sur Twitter : "The v0.14 release of BERTopic is here. Fine-tune your topic keywords and labels with models from @OpenAI, @huggingface, @CohereAI, @spacy_io, and @LangChainAI... An overview thread" 2023-02-15T13:56:16Z 2023-02-15 what? 2023-02-04T02:42:34Z 2023-02-04 Shital Shah sur Twitter : "Surprising and important paper: TLDR; All the gains we get by first pre-training on large dataset and then fine tuning on small dataset could be obtained by just small dataset but with pre-training objective!!" 2023-02-02T01:24:11Z 2023-02-02 karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. abacaj/awesome-transformers: A curated list of awesome transformer models. 2023-02-25 2023-02-25T14:27:29Z 1. Captured ideas are better than missed ones (our tool has to be fast, and can’t burden you with questions like “In what folder should I put this?” that aren’t relevant in the moment.) 2. Adding new ideas is better than updating old ones (our memory grows by remembering new things rather than “updating” old memories) 3. Ideas that can’t be recalled are worse than useless 4. Time is essential to how we remember The Knowledge Graph Conference 2023 2023-02-24 2023-02-24T13:43:43Z [Incremental note-taking | thesephist.com](doc:2023/02/incremental_note_taking_%7C_these) 2023-02-25 Incremental note-taking | thesephist.com 2023-02-25T12:18:57Z 2023-02-25T12:17:53Z 2023-02-25 Building Monocle, a universal personal search engine for life | thesephist.com 2023-02-25 Linus sur Twitter : "I built a personal chatbot from my personal corpus a couple weeks ago on fully open-source LMs... it made a huge difference in how it feels to interact. Much more natural... 2023-02-25T11:08:25Z 2023-02-16T11:35:46Z 2023-02-16 Efficient Training on a Single GPU (((ل()(ل() 'yoav))))👾 sur Twitter : "there is this genre of papers that show you can train/tune only some subset of a network's weights, freezing the rest, and things still work as well as (or better than) full training/tuning..." 2023-02-15T10:47:10Z 2023-02-15 2023-02-04T02:09:32Z 2023-02-04 Generative AI with Cohere: Part 1 - Model Prompting The Origins of ChatGPT and InstructGPT - DZone 2023-02-09 2023-02-09T09:14:24Z some technical details, avec des graphiques 2023-02-18T11:17:11Z 2023-02-18 ‘I want to destroy whatever I want’: Bing’s AI chatbot unsettles US reporter | Artificial intelligence (AI) | The Guardian 2023-02-26 Ohio rail crash: toxic waste removal suspended amid contamination fears | Ohio train derailment | The Guardian > “If some of these chemicals are so bad that the only way to get rid of them is to bury them in a deep hole, then why are we producing these chemicals in the first place?” 2023-02-26T15:54:22Z 2023-02-04T02:04:59Z 2023-02-04 Ramsri Goutham Golla sur Twitter : "The most practical open-source competitor to @OpenAI 's GPT-3 is Google's Flan-T5 Here are 5 Flan-T5 resources to try out easily, deploy, or fine-tune it! 🧵" / Twitter Tillya Tepe - Wikipedia 2023-02-12T11:18:26Z 2023-02-12 2023-02-26 2023-02-26T01:41:58Z Andrew Lampinen sur Twitter : "What is emergence, and why is it of recent interest in AI... A thread: 1/" / Twitter 2023-02-28T20:42:52Z 2023-02-28 Paris 2024 : tirés au sort, ils renoncent à acheter des billets trop chers 2023-02-07T01:20:05Z 2023-02-07 Microsoft launches Teams Premium with features powered by OpenAI - The Verge Sebastian Raschka sur Twitter : "A question that often comes up when introducing colleagues to the attention mechanism: how are attention scores different from weights in a fully-connected layer?..." 2023-02-26T20:09:42Z 2023-02-26 2023-02-11 > example code on how to combine zero- and few-shot learning with a small annotation effort 2023-02-11T10:45:36Z explosion/prodigy-openai-recipes: ✨ Bootstrap annotation with zero- & few-shot learning via OpenAI GPT-3 Open Graph Benchmark | A collection of benchmark datasets, data-loaders and evaluators for graph machine learning in PyTorch. 2023-02-07T14:02:45Z 2023-02-07 2023-02-23T08:14:42Z 2023-02-23 Delip Rao sur Twitter : "Let's talk about PDF Parsers. What are the best paid/free PDF parsers?" 2023-02-04T16:34:37Z 2023-02-04 Bojan Tunguz sur Twitter : "What I would *REALLY* love to have is a private version of ChatGPT that’s been trained on your internal org documents..." c'est pourquoi microsoft y investit (!?) 2023-02-17T00:04:45Z 2023-02-17 How should AI systems behave, and who should decide? 2302.04907 2023-02-09T19:27:34Z > One-bit weight-only Transformer can achieve the same quality as a float one on WMT dataset and scale and generalize well, while being 16x smaller in size. Ankush Garg Orhan Firat Łukasz Lew Binarized Neural Machine Translation Behrooz Ghorbani Zhiru Zhang [2302.04907] Binarized Neural Machine Translation Yuan Cao The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residual connections to improve binarization quality. Experiments on the WMT dataset show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size. One-bit activations incur varying degrees of quality drop, but mitigated by the proposed architectural changes. We further conduct a scaling law study using production-scale translation datasets, which shows that one-bit weight Transformers scale and generalize well in both in-domain and out-of-domain settings. Implementation in JAX/Flax will be open sourced. Yichi Zhang 2023-02-13T14:51:45Z Yichi Zhang 2023-02-13 2023-02-09T19:27:34Z 2023-02-13T18:48:37Z Fiasco de la finale de la Ligue des champions : un rapport de l’UEFA accable la police française 2023-02-13 2023-02-09T18:59:55Z Ji Lin Guangxuan Xiao Song Han Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning. Offsite-Tuning: Transfer Learning without Full Model 2023-02-11T18:33:24Z 2302.04870 2023-02-11 > Achieves comparable accuracy as full model fine-tuning while being privacy-preserving and efficient I'd wish it to be related with this: "[Microsoft will let companies create their own ChatGPT](https://twitter.com/DrJimFan/status/1623354315594432512?s=20&t=wQpsuFehMrgP1720n2wtJw)" 2023-02-09T18:59:55Z Guangxuan Xiao [2302.04870] Offsite-Tuning: Transfer Learning without Full Model 2023-02-15 Jim Fan sur Twitter : "Do you know that DeepMind has actually open-sourced the heart of AlphaGo & AlphaZero?... " 2023-02-15T10:20:43Z 2023-02-27T15:03:10Z 2023-02-27 EDF : les raisons d’une descente aux enfers Hyperlink maximalism | thesephist.com 2023-02-26 2023-02-26T02:48:48Z Jerry Liu sur Twitter : "A key goal of @gpt_index is to enable end users to ask an LLM *any* questions over their own data..." 2023-02-27T14:57:25Z 2023-02-27 2023-02-10 2023-02-10T21:13:07Z Comparing Africa-centric Models to OpenAI's GPT3.5 - Lelapa 2023-02-11T19:06:31Z 2023-02-11 The Generative AI Race Has a Dirty Secret | WIRED [Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT—Stephen Wolfram Writings](doc:2023/03/wolfram%7Calpha_as_the_way_to_bri) 2023-02-20 2023-02-20T18:40:43Z Creating a super-powered assistant with ChatGPT and Wolfram Alpha Simple API thesephist.com (Linus) 2023-02-26T02:02:42Z My research investigates the future of knowledge representation and creative work aided by machine understanding of language 2023-02-26 Jinyeong Yim 2023-02-13 2023-02-13T23:54:43Z 2022-10-06T06:50:39Z Jinyoung Park Sangdoo Yun OCR-free Document Understanding Transformer Geewook Kim Moonbin Yim Wonseok Hwang 2021-11-30T18:55:19Z Teakgyu Hong Dongyoon Han > The #LayoutLM family, used by a lot of document AI companies, gets a strong competitor: Donut, now available in Hugging Face Transformers! [src](https://www.linkedin.com/posts/niels-rogge-a3b7a3127_layoutlm-huggingface-transformers-activity-6963894171640205313-N2_U/) [HuggingFace Docs](https://huggingface.co/docs/transformers/main/en/model_doc/donut) ; [Gradio demo](https://huggingface.co/spaces/nielsr/donut-cord) ; [Tutorial notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Donut) Seunghyun Park 2111.15664 Geewook Kim Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut. Jeongyeon Nam [2111.15664] OCR-free Document Understanding Transformer 2023-02-14T00:58:24Z 2023-02-14 Data-Efficient Information Extraction from Documents with Pre-Trained Language Models 2023-02-25 2023-02-25T00:59:01Z > LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. > > The weights for all models are open and available > > trained on at least 1T tokens, > > Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, > > We also briefly tried instruction finetuning LLaMA-13B is competitive with GPT-3, despite being 10x smaller. But that's not really open-source [github](https://github.com/facebookresearch/llama) "The license prohibits using the models or any data produced by the models for any type of commercial or production purpose." Guillaume Lample sur Twitter : "Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters..." 2023-02-20T22:54:30Z Andrej Karpathy sur Twitter : "The hottest new programming language is English" / Twitter 2023-02-20 Chau Tran sur Twitter : "Some "in the trenches" learnings from integrating vector search into an enterprise search system..." Blog post: [Unlocking the Power of Vector Search in Enterprise](doc:2023/02/unlocking_the_power_of_vector_s) > 1. As of Feb 2023, open source text embedding models on @huggingface (E5-large, Instructor-XL, and MPNet) are > to other commercial providers > 2. on out-of-domain data (enterprise search being an extreme case of this)... finetuning embedding models extremely helpful > 3. Vector search, while helpful, is not the whole story! We still need traditional keyword search and personalization 2023-02-17 Unlocking the Power of Vector Search in Enterprise 2023-02-17T18:02:35Z > we've developed a method for fine-tuning embeddings to the unique language of our clients 2023-02-17T17:57:25Z 2023-02-17 2023-02-23T22:43:34Z Maria Khalusova @maria@recsys.social sur Twitter : "Did you know that you can tweak the text output generated by a LLM without changing any of the trainable parameters?..." 2023-02-23 just tweak the text generation strategy Fangxiaoyu Feng Yamini Bansal 2302.01398 2023-02-07T18:49:52Z Colin Cherry George Foster 2023-02-02T20:19:46Z 2023-02-02T20:19:46Z Melvin Johnson Maxim Krikun Orhan Firat Xavier Garcia [2302.01398] The unreasonable effectiveness of few-shot learning for machine translation 2023-02-07 We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems. The unreasonable effectiveness of few-shot learning for machine translation > We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems [tweet](https://twitter.com/mr_cheu/status/1622648632867422211?s=20&t=DLVMU-Qrp9DksDse99fkjQ) Xavier Garcia 2023-02-02T02:09:52Z 1er Juillet 1916 : la journée la plus meurtrière de la première guerre mondiale. - Collège Robert Doisneau 2023-02-02 > Au cours de la première guerre mondiale, près de 5000 soldats trouvent la mort, chaque jour sur l’ensemble des fronts. > 1 juillet 1916, début de l'offensive de la Somme. 60 000 victimes chez les Britanniques, dont au moins 20 000 morts. 20 000 morts chez les Allemands 22 août 1914: Environ 27 000 soldats français ont été tués, faisant de cette journée le jour le plus meurtrier de l'Histoire de France. [src](https://fr.wikipedia.org/wiki/115e_régiment_d%27infanterie) 2023-02-25T15:07:48Z « Polluants éternels » : le plan de bataille des industriels pour éviter l’interdiction du « poison du siècle » 2023-02-25 2023-02-02 2023-02-02T01:30:40Z Yann LeCun sur Twitter : "Language abilities != Thinking. Or why LLMs such as ChatGPT can eloquently spew complete nonsense..." ChatGPT Burns Millions Every Day. Can Computer Scientists Make AI One Million Times More Efficient? 2023-02-21T01:25:43Z 2023-02-21 > Training a large language model like that used by ChatGPT is expensive — likely in the tens of millions of dollars — but running it is the true expense. > “Deploying current ChatGPT into every search done by Google would require 512,820 A100 HGX servers with a total of 4,102,568 A100 GPUs,” they write. “The total cost of these servers and networking exceeds $100 billion of Capex alone, of which Nvidia would receive a large portion.” Towards a Tagalog NLP pipeline 2023-02-04T16:41:56Z 2023-02-04 2023-02-14T00:47:58Z Avant l'Islam : chrétiens d’Éthiopie vs juifs d'Arabie -- Himyar et Aksoum 2023-02-14 C Thi Nguyen sur Twitter : "There's this quality I've been thinking about, call it "failure clarity"...." / Twitter 2023-02-02T01:59:04Z 2023-02-02 > My fave example is climbing knots. The figure-eight not isn't the strongest possible knot, *but it's easy to see if you screwed it up*. When the process fails, the failure isn't hidden. 2023-02-05 2023-02-05T09:37:45Z Yann LeCun sur Twitter : "On the highway towards Human-Level AI, Large Language Model is an off-ramp." 2023-02-21T15:20:37Z 2302.10724 Mateusz Kochanek 2023-02-22 Dominika Szydło Piotr Miłkowski Łukasz Radliński 2023-02-21T15:20:37Z Przemysław Kazienko Marcin Gruza Stanisław Woźniak OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. There are several publications on ChatGPT evaluation, testing its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness and stance detection, natural language inference, word sense disambiguation, linguistic acceptability and question answering. We automated ChatGPT's querying process and analyzed more than 38k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25% for zero-shot and few-shot evaluation. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability of personalizing ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool's usefulness to society and how the learning and validation procedures for such systems should be established. Marcin Oleksy ChatGPT: Jack of all trades, master of none Julita Bielaniewicz Igor Cichecki Anna Kocoń Bartłomiej Koptyra Wiktoria Mieleszczenko-Kowszewicz 2023-02-22T13:41:17Z Jan Kocoń Kamil Kanclerz Joanna Baran [2302.10724] ChatGPT: Jack of all trades, master of none Arkadiusz Janz Konrad Wojtasik Oliwier Kaszyca Jan Kocoń Maciej Piasecki 2023-02-02 2023-02-02T16:50:37Z BioGPT François Chollet sur Twitter : "The near future of AI is to serve as a universal assistant..." 2023-02-02 2023-02-02T01:35:17Z anton sur Twitter : "announcing the chroma embedding database. it's the easiest and best way to work with embeddings in your a.i. app. ..." 2023-02-16T09:30:20Z 2023-02-16 [Chroma](doc:2023/04/chroma) 2023-02-07T08:03:58Z 2023-02-07 Google announces ChatGPT rival Bard, with wider availability in ‘coming weeks’ - The Verge 2023-02-16T22:57:26Z 2023-02-16 Nils Reimers sur Twitter : "Building search products that support many languages was always a nightmare..." Convention citoyenne sur la fin de vie : les doutes par-delà les votes 2023-02-26 2023-02-26T11:38:10Z 2023-02-27 2023-02-27T23:18:48Z Shayne Longpre sur Twitter : "A 🧵 on @OpenAI LLM "Alignment" (e.g. #ChatGPT)..." The Night of the Hunter - La nuit du chasseur 2023-02-26 2023-02-26T19:08:30Z 2023-02-14T10:42:51Z 2023-02-14 Guiding Frozen Language Models with Learned Soft Prompts – Google AI Blog 2023-02-27 Même sans gaz russe, l’Allemagne poursuit sa croissance et sa transition énergétique 2023-02-27T01:18:52Z Hongjin Su Hongjin Su [2212.09741] One Embedder, Any Task: Instruction-Finetuned Text Embeddings Mari Ostendorf 2212.09741 One Embedder, Any Task: Instruction-Finetuned Text Embeddings Yushi Hu Luke Zettlemoyer Weijia Shi > INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any further training. > every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). [Documentation](https://instructor-embedding.github.io) ; [At Hugging Face](doc:2023/02/hkunlp_instructor_xl_·_hugging_) ex of use [here](https://postgresml.org/blog/generating-llm-embeddings-with-open-source-models-in-postgresml) Noah A. Smith 2023-02-17 2022-12-19T18:57:05Z We introduce INSTRUCTOR, a new method for computing text embeddings given task instructions: every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). Unlike encoders from prior work that are more specialized, INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any further training. We first annotate instructions for 330 diverse tasks and train INSTRUCTOR on this multitask mixture with a contrastive loss. We evaluate INSTRUCTOR on 70 embedding evaluation tasks (66 of which are unseen during training), ranging from classification and information retrieval to semantic textual similarity and text generation evaluation. INSTRUCTOR, while having an order of magnitude fewer parameters than the previous best model, achieves state-of-the-art performance, with an average improvement of 3.4% compared to the previous best results on the 70 diverse datasets. Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets. Our model, code, and data are available at https://instructor-embedding.github.io. Yizhong Wang Jungo Kasai 2023-02-17T18:12:26Z Wen-tau Yih Tao Yu 2022-12-20T05:11:06Z Prompting, Instruction Finetuning, and RLHF (CS224N) 2023-02-16 2023-02-16T23:12:04Z stanfordnlp/dspy: 𝗗𝗦𝗣: Demonstrate-Search-Predict. A framework for composing retrieval and language models for knowledge-intensive NLP. 2023-02-18T11:32:46Z (initially called DSP, rebranded as DSPy) > The DSP framework provides a programming abstraction for building grounded AI systems. In a few lines of code, a DSP program expresses rich interactions between retrieval models (RMs) and language models (LMs) to tackle difficult knowledge-intensive NLP tasks (e.g., complex question answering or conversational search). > DSP discourages ["prompt engineering"](tag:prompted_models), which we view much the same way as hyperparameter tuning in traditional ML [@matei_zaharia](https://twitter.com/matei_zaharia/status/1626705622585716737?s=20): >Who are the World Cup champions? I knew ChatGPT would get it wrong when it launched, but it's surprising that all the new search+LLM engines do too. > > **Combining retrieval+LMs won't just be a matter of prompting**. That's why we've been building tools like DSP at Stanford to do it. 2023-02-18 Project's goal: A truly open ChatGPT like assistant 2023-02-06T18:12:32Z LAION-AI/Open-Assistant: OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. 2023-02-06 2023-02-01T18:29:11Z 2023-02-01 Shayne Longpre sur Twitter : "What’s the best completely public competitor to #ChatGPT? Flan-T5 beats all public models we tested..." > It's promising these results don't use any [#RLHF](tag:reinforcement_learning_from_human_feedback) data, or human "alignment", which is expensive to collect and less publicly available. > Key takeaway: finetuning Flan-T5 is better and more compute-efficient than finetuning T5.[src](https://twitter.com/_jasonwei/status/1620864198262804481?s=20&t=hMXLCdqcOFAEbjsfwc_yog) 2023-02-09T08:03:00Z 2023-02-09 Derrière la fin brutale de l’empire hittite au XIIe siècle avant J.-C., une période de sécheresse exceptionnelle 2023-02-02T14:36:21Z 2023-02-02 hwchase17/langchain: ⚡ Building applications with LLMs through composability ⚡ Un an après, les trajectoires brisées des jeunes Africains qui ont fui l’Ukraine pour la France 2023-02-22 2023-02-22T22:53:53Z 2023-02-26T23:28:59Z 2023-02-26 LLM Powered Assistants for Complex Interfaces - Nick Arner 2023-02-02T09:14:36Z 2023-02-02 The Flan Collection: Advancing open source methods for instruction tuning – Google AI Blog > The ability to reason on new tasks is mostly credited to training models on a wide variety of unique instructions, known as “instruction tuning”, which was introduced by FLAN and extended in T0, Super-Natural Instructions, MetaICL, and InstructGPT. 2023-02-14T10:30:22Z 2023-02-14 Chelsea Finn sur Twitter : "Not sure how best to use your pre-trained model? Try projecting your features onto a low-dim basis before adding a linear head..." Chatbots Gone Wild, Surveillance Takes Hold, Rules for Military AI, Robot Training Streamlined 2023-02-23 2023-02-23T12:13:45Z 2023-02-17T10:38:12Z Eric Lehman Diwakar Mahajan Alistair Johnson Peter Szolovits TL;DR: yes > These findings highlight the importance of developing models for highly specialized domains such as clinical text 2302.08091 2023-02-16T05:08:34Z Micah J. Smith [2302.08091] Do We Still Need Clinical Language Models? Zachary Ziegler 2023-02-17 Eric Lehman Emily Alsentzer 2023-02-16T05:08:34Z Jonas Wulff Daniel Nadler Although recent advances in scaling large language models (LLMs) have resulted in improvements on many NLP tasks, it remains unclear whether these models trained primarily with general web text are the right tool in highly specialized, safety critical domains such as clinical text. Recent results have suggested that LLMs encode a surprising amount of medical knowledge. This raises an important question regarding the utility of smaller domain-specific language models. With the success of general-domain LLMs, is there still a need for specialized clinical models? To investigate this question, we conduct an extensive empirical analysis of 12 language models, ranging from 220M to 175B parameters, measuring their performance on 3 different clinical tasks that test their ability to parse and reason over electronic health records. As part of our experiments, we train T5-Base and T5-Large models from scratch on clinical notes from MIMIC III and IV to directly investigate the efficiency of clinical tokens. We show that relatively small specialized clinical models substantially outperform all in-context learning approaches, even when finetuned on limited annotated data. Further, we find that pretraining on clinical tokens allows for smaller, more parameter-efficient models that either match or outperform much larger language models trained on general text. We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement. Evan Hernandez Do We Still Need Clinical Language Models? 2023-02-23 2023-02-23T13:25:12Z Ivan Vulić [tweet](https://twitter.com/seb_ruder/status/1628721434162765827?s=20) Jonas Pfeiffer Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference and that generalise systematically to non-identically distributed tasks. Modular deep learning has emerged as a promising solution to these challenges. In this framework, units of computation are often implemented as autonomous parameter-efficient modules. Information is conditionally routed to a subset of modules and subsequently aggregated. These properties enable positive transfer and systematic generalisation by separating computation from routing and updating modules locally. We offer a survey of modular architectures, providing a unified view over several threads of research that evolved independently in the scientific literature. Moreover, we explore various additional purposes of modularity, including scaling language models, causal inference, programme induction, and planning in reinforcement learning. Finally, we report various concrete applications where modularity has been successfully deployed such as cross-lingual and cross-modal knowledge transfer. Related talks and projects to this survey, are available at https://www.modulardeeplearning.com/. 2302.11529 2023-02-22T18:11:25Z Modular Deep Learning Edoardo Maria Ponti 2023-02-22T18:11:25Z Sebastian Ruder Jonas Pfeiffer [2302.11529] Modular Deep Learning 2023-02-13T00:47:32Z Timo Schick sur Twitter : "Introducing the Toolformer, a language model that teaches itself to use various tools in a self-supervised way..." 2023-02-13 > Instructor, an **instruction-finetuned text embedding model that can generate text embeddings tailored to any task** (e.g., classification, retrieval, clustering, text evaluation, etc.) and domains (e.g., science, finance, etc.) **by simply providing the task instruction, without any finetuning.** > > easy to use with our **customized sentence-transformer library** hkunlp/instructor-xl · Hugging Face 2023-02-17T18:18:01Z 2023-02-17 2023-02-21 2023-02-21T17:14:22Z Aran Komatsuzaki sur Twitter : "Poisoning Web-Scale Training Datasets is Practical Shows how to effectively poison 0.01% of datasets like LAION-400M for just $60 USD" Parameter-Efficient Fine-Tuning using 🤗 PEFT 2023-02-10 2023-02-10T22:55:03Z