]> 2023-02-09 2023-02-09T09:14:24Z The Origins of ChatGPT and InstructGPT - DZone some technical details, avec des graphiques 2023-02-06 2023-02-06T20:21:21Z « Les services publics sont écrasés par les logiques comptables de réduction des coûts » Biodiversité : « Ni l’ampleur, ni la rapidité, ni le caractère systémique de l’écroulement des insectes n’ont été anticipés par les scientifiques » 2023-02-12T12:49:30Z 2023-02-12 Ramsri Goutham Golla sur Twitter : "The most practical open-source competitor to @OpenAI 's GPT-3 is Google's Flan-T5 Here are 5 Flan-T5 resources to try out easily, deploy, or fine-tune it! 🧵" / Twitter 2023-02-04 2023-02-04T02:04:59Z Tillya Tepe - Wikipedia 2023-02-12T11:18:26Z 2023-02-12 2023-02-02 deepset sur Twitter : " Generative models have taken the world of NLP by storm. But LLMs do not know about your personal data. This makes personal assistants, enterprise knowledge management and many other applications challenging. Retrieval augmented pipelines are the answer" 2023-02-02T22:47:09Z [Jay Hack @mathemagic1an sur twitter](https://twitter.com/mathemagic1an/status/1624870248221663232): > from a small seed set of human inputs (essentially demonstrating usage of APIs), the training set for this behavior is generated by the LLM itself. > > So what does this mean? We've found a promising way to tightly integrate arbitrary APIs with our best-performing models. 2302.04761 2023-02-09T16:49:57Z Toolformer: Language Models Can Teach Themselves to Use Tools Roberto Dessì 2023-02-09T16:49:57Z Timo Schick Timo Schick Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities. Maria Lomeli Jane Dwivedi-Yu Roberta Raileanu 2023-02-13T15:16:31Z Jay Hack sur Twitter : "My thoughts on Toolformer IMO the most important paper in the past few weeks..." 2023-02-13 Nicola Cancedda 2023-02-13 [2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools 2023-02-13T15:18:25Z Luke Zettlemoyer Thomas Scialom Jinyeong Yim 2023-02-13 2023-02-13T23:54:43Z 2022-10-06T06:50:39Z Jinyoung Park Sangdoo Yun OCR-free Document Understanding Transformer Geewook Kim Moonbin Yim Wonseok Hwang 2021-11-30T18:55:19Z Teakgyu Hong Dongyoon Han > The #LayoutLM family, used by a lot of document AI companies, gets a strong competitor: Donut, now available in Hugging Face Transformers! [src](https://www.linkedin.com/posts/niels-rogge-a3b7a3127_layoutlm-huggingface-transformers-activity-6963894171640205313-N2_U/) [HuggingFace Docs](https://huggingface.co/docs/transformers/main/en/model_doc/donut) ; [Gradio demo](https://huggingface.co/spaces/nielsr/donut-cord) ; [Tutorial notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Donut) Seunghyun Park 2111.15664 Geewook Kim Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut. Jeongyeon Nam [2111.15664] OCR-free Document Understanding Transformer Data-Efficient Information Extraction from Documents with Pre-Trained Language Models 2023-02-14T00:58:24Z 2023-02-14 Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning. 2023-02-07T16:40:38Z 2203.14465 Eric Zelikman "Self-Taught Reasoner" (STaR) > (to our knowledge) the first technique to allow a pre-trained large language model to iteratively use its language modeling capacity to improve itself > Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose **a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales**, to bootstrap the ability to perform successively more complex reasoning. [2203.14465] STaR: Bootstrapping Reasoning With Reasoning Eric Zelikman 2022-03-28T03:12:15Z Yuhuai Wu Noah D. Goodman Jesse Mu STaR: Bootstrapping Reasoning With Reasoning 2023-02-07 2022-05-20T13:52:54Z Microsoft launches Teams Premium with features powered by OpenAI - The Verge 2023-02-07T01:20:05Z 2023-02-07 Project's goal: A truly open ChatGPT like assistant 2023-02-06T18:12:32Z LAION-AI/Open-Assistant: OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. 2023-02-06 Fangxiaoyu Feng Yamini Bansal 2302.01398 2023-02-07T18:49:52Z Colin Cherry George Foster 2023-02-02T20:19:46Z 2023-02-02T20:19:46Z Melvin Johnson Maxim Krikun Orhan Firat Xavier Garcia [2302.01398] The unreasonable effectiveness of few-shot learning for machine translation 2023-02-07 We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems. The unreasonable effectiveness of few-shot learning for machine translation > We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems [tweet](https://twitter.com/mr_cheu/status/1622648632867422211?s=20&t=DLVMU-Qrp9DksDse99fkjQ) Xavier Garcia explosion/prodigy-openai-recipes: ✨ Bootstrap annotation with zero- & few-shot learning via OpenAI GPT-3 2023-02-11T10:45:36Z > example code on how to combine zero- and few-shot learning with a small annotation effort 2023-02-11 > It's promising these results don't use any [#RLHF](tag:reinforcement_learning_from_human_feedback) data, or human "alignment", which is expensive to collect and less publicly available. > Key takeaway: finetuning Flan-T5 is better and more compute-efficient than finetuning T5.[src](https://twitter.com/_jasonwei/status/1620864198262804481?s=20&t=hMXLCdqcOFAEbjsfwc_yog) Shayne Longpre sur Twitter : "What’s the best completely public competitor to #ChatGPT? Flan-T5 beats all public models we tested..." 2023-02-01 2023-02-01T18:29:11Z 2023-02-02 1er Juillet 1916 : la journée la plus meurtrière de la première guerre mondiale. - Collège Robert Doisneau 2023-02-02T02:09:52Z > Au cours de la première guerre mondiale, près de 5000 soldats trouvent la mort, chaque jour sur l’ensemble des fronts. 2023-02-02T01:30:40Z 2023-02-02 Yann LeCun sur Twitter : "Language abilities != Thinking. Or why LLMs such as ChatGPT can eloquently spew complete nonsense..." 2023-02-09 2023-02-09T08:03:00Z Derrière la fin brutale de l’empire hittite au XIIe siècle avant J.-C., une période de sécheresse exceptionnelle Open Graph Benchmark | A collection of benchmark datasets, data-loaders and evaluators for graph machine learning in PyTorch. 2023-02-07T14:02:45Z 2023-02-07 2023-02-11 2023-02-11T10:31:19Z ChatGPT Is a Blurry JPEG of the Web | The New Yorker New Yorker's article is very good, but the point is: ChatGPT is not a Knowledge Base, it is a system that masters Natural Language. 2023-02-04 2023-02-04T16:41:56Z Towards a Tagalog NLP pipeline hwchase17/langchain: ⚡ Building applications with LLMs through composability ⚡ 2023-02-02 2023-02-02T14:36:21Z 2023-02-14 Avant l'Islam : chrétiens d’Éthiopie vs juifs d'Arabie -- Himyar et Aksoum 2023-02-14T00:47:58Z 2023-02-04 2023-02-04T16:34:37Z Bojan Tunguz sur Twitter : "What I would *REALLY* love to have is a private version of ChatGPT that’s been trained on your internal org documents..." c'est pourquoi microsoft y investit (!?) > My fave example is climbing knots. The figure-eight not isn't the strongest possible knot, *but it's easy to see if you screwed it up*. When the process fails, the failure isn't hidden. 2023-02-02 2023-02-02T01:59:04Z C Thi Nguyen sur Twitter : "There's this quality I've been thinking about, call it "failure clarity"...." / Twitter 2023-02-05T09:37:45Z 2023-02-05 Yann LeCun sur Twitter : "On the highway towards Human-Level AI, Large Language Model is an off-ramp." Shital Shah sur Twitter : "Surprising and important paper: TLDR; All the gains we get by first pre-training on large dataset and then fine tuning on small dataset could be obtained by just small dataset but with pre-training objective!!" 2023-02-04 2023-02-04T02:42:34Z what? > The ability to reason on new tasks is mostly credited to training models on a wide variety of unique instructions, known as “instruction tuning”, which was introduced by FLAN and extended in T0, Super-Natural Instructions, MetaICL, and InstructGPT. The Flan Collection: Advancing open source methods for instruction tuning – Google AI Blog 2023-02-02 2023-02-02T09:14:36Z 2302.04907 2023-02-09T19:27:34Z > One-bit weight-only Transformer can achieve the same quality as a float one on WMT dataset and scale and generalize well, while being 16x smaller in size. Ankush Garg Orhan Firat Łukasz Lew Binarized Neural Machine Translation Behrooz Ghorbani Zhiru Zhang [2302.04907] Binarized Neural Machine Translation Yuan Cao The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residual connections to improve binarization quality. Experiments on the WMT dataset show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size. One-bit activations incur varying degrees of quality drop, but mitigated by the proposed architectural changes. We further conduct a scaling law study using production-scale translation datasets, which shows that one-bit weight Transformers scale and generalize well in both in-domain and out-of-domain settings. Implementation in JAX/Flax will be open sourced. 2023-02-13T14:51:45Z Yichi Zhang Yichi Zhang 2023-02-13 2023-02-09T19:27:34Z karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. 2023-02-02 2023-02-02T01:24:11Z 2023-02-13 2023-02-13T18:48:37Z Fiasco de la finale de la Ligue des champions : un rapport de l’UEFA accable la police française 2023-02-09T18:59:55Z Ji Lin Guangxuan Xiao Song Han Transfer learning is important for foundation models to adapt to downstream tasks. However, many foundation models are proprietary, so users must share their data with model owners to fine-tune the models, which is costly and raise privacy concerns. Moreover, fine-tuning large foundation models is computation-intensive and impractical for most downstream users. In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model. In offsite-tuning, the model owner sends a light-weight adapter and a lossy compressed emulator to the data owner, who then fine-tunes the adapter on the downstream data with the emulator's assistance. The fine-tuned adapter is then returned to the model owner, who plugs it into the full model to create an adapted foundation model. Offsite-tuning preserves both parties' privacy and is computationally more efficient than the existing fine-tuning methods that require access to the full model weights. We demonstrate the effectiveness of offsite-tuning on various large language and vision foundation models. Offsite-tuning can achieve comparable accuracy as full model fine-tuning while being privacy-preserving and efficient, achieving 6.5x speedup and 5.6x memory reduction. Code is available at https://github.com/mit-han-lab/offsite-tuning. Offsite-Tuning: Transfer Learning without Full Model 2023-02-11T18:33:24Z 2302.04870 2023-02-11 > Achieves comparable accuracy as full model fine-tuning while being privacy-preserving and efficient I'd wish it to be related with this: "[Microsoft will let companies create their own ChatGPT](https://twitter.com/DrJimFan/status/1623354315594432512?s=20&t=wQpsuFehMrgP1720n2wtJw)" 2023-02-09T18:59:55Z Guangxuan Xiao [2302.04870] Offsite-Tuning: Transfer Learning without Full Model BioGPT 2023-02-02T16:50:37Z 2023-02-02 Comparing Africa-centric Models to OpenAI's GPT3.5 - Lelapa 2023-02-10T21:13:07Z 2023-02-10 2023-02-02T01:35:17Z 2023-02-02 François Chollet sur Twitter : "The near future of AI is to serve as a universal assistant..." Google announces ChatGPT rival Bard, with wider availability in ‘coming weeks’ - The Verge 2023-02-07T08:03:58Z 2023-02-07 2023-02-11 2023-02-11T19:06:31Z The Generative AI Race Has a Dirty Secret | WIRED Generative AI with Cohere: Part 1 - Model Prompting 2023-02-04 2023-02-04T02:09:32Z 2023-02-13T00:47:32Z 2023-02-13 Timo Schick sur Twitter : "Introducing the Toolformer, a language model that teaches itself to use various tools in a self-supervised way..." Parameter-Efficient Fine-Tuning using 🤗 PEFT 2023-02-10T22:55:03Z 2023-02-10