]> What is OpenAI's CLIP and how to use it? 2023-03-07 2023-03-07T19:12:56Z > We demonstrate that, **beyond its mastery of language**, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and **often vastly surpasses prior models such as ChatGPT**. Varun Chandrasekaran Yin Tat Lee Sébastien Bubeck Yi Zhang Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions. Marco Tulio Ribeiro 2023-03-24T00:29:39Z 2023-03-24 Hamid Palangi Johannes Gehrke Yuanzhi Li Scott Lundberg Peter Lee 2023-03-22T16:51:28Z 2023-03-22T16:51:28Z Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Ronen Eldan Ece Kamar [2303.12712] Sparks of Artificial General Intelligence: Early experiments with GPT-4 Harsha Nori 2303.12712 Eric Horvitz Jimmy Lin sur Twitter : "GPT-4 and its ilk are awesome for rapid prototyping and one-offs, but at the end of the day, enterprises will deploy far smaller distilled models in production. Here's my contrarian take -" / Twitter 2023-03-21T18:06:46Z 2023-03-21 2023-03-18 2023-03-18T16:43:03Z Aggarwal 2023-03-18 On the Surprising Behavior of Distance Metrics in High Dimensional Space [GitHub](https://github.com/dborrelli/chat-intents) refers to [On the Surprising Behavior of Distance Metrics in High Dimensional Space](doc:2023/03/on_the_surprising_behavior_of_d) 2023-03-18T16:45:40Z Clustering sentence embeddings to identify intents in short text | by David Borrelli | Towards Data Science 2023-03-15T01:33:51Z raphaelsty.github.io/knowledge demo 2023-03-15 Riddle solved: Why was Roman concrete so durable? | MIT News 2023-03-08 2023-03-08T01:49:35Z Overview - OpenAI API 2023-03-12 2023-03-12T15:38:07Z La Vieille Fille (Balzac) 2023-03-05 2023-03-05T13:38:54Z Hannah Fry sur Twitter : "Every single person who confuses correlation and causation ends up dying." 2023-03-03 2023-03-03T23:35:15Z > Produire pour la mégamachine 2023-03-13T13:40:56Z LCP sur Twitter : "Le billet de @CamilleEtienne_ : "Alors que le vivant s'effondre, que le plancher se dérobe sous nos pieds, il faudrait rester assis, derrière nos écrans, produire pour la mégamachine. Il est urgent de prendre du temps, pour ne plus en perdre." 2023-03-13 Tutorial: VS Code with Google Cloud AI Platform as a backend | by Kyle Ziegler | Medium 2023-03-21T17:44:32Z 2023-03-21 PyTorch on Google Cloud: How To train and tune PyTorch models on Vertex AI | Google Cloud Blog (Sept. 2021) 2023-03-23T08:46:50Z 2023-03-23 Enabling Python VirtualEnv in JupyterLab | My Shitty Code 2023-03-08T13:59:47Z 2023-03-08 2023-03-21 2023-03-21T23:27:11Z antimatter15/alpaca.cpp: Locally run an Instruction-Tuned Chat-Style LLM > This combines the LLaMA foundation model with an open reproduction of Stanford Alpaca a fine-tuning of the base model to obey instructions (akin to the RLHF used to train ChatGPT) and a set of modifications to llama.cpp to add a chat interface. > Language models still fail to match the language abilities of humans. **[Predictive coding theory](tag:predictive_coding.html)** offers a tentative explanation to this discrepancy: **while language models are optimized to predict nearby words, **the human brain would continuously predict a hierarchy of representations that spans multiple timescales**. To test this hypothesis, we analysed the functional magnetic resonance imaging brain signals of 304 participants listening to short stories. First, we confirmed that the activations of modern language models linearly map onto the brain responses to speech. Second, we showed that enhancing these algorithms with predictions that span multiple timescales improves this brain mapping. Finally, we showed that these predictions are organized hierarchically: frontoparietal cortices predict higher-level, longer-range and more contextual representations than temporal cortices. Overall, **these results strengthen the role of hierarchical predictive coding in language processing and illustrate how the synergy between neuroscience and artificial intelligence can unravel the computational bases of human cognition.** > our results support the idea that, unlike current language algorithms, the brain is not limited to predict word-level representations but rather predicts multiple levels of representations. > our study thus calls for systematically training algorithms to predict multiple timescales and levels of representations 2023-03-03 Evidence of a predictive coding hierarchy in the human brain listening to speech | Nature Human Behaviour 2023-03-03T22:26:12Z [Evidence of a predictive coding hierarchy in the human brain listening to speech | Nature Human Behaviour](doc:2023/03/evidence_of_a_predictive_coding) FP Servant sur Twitter : "while #LanguageModels are optimized to predict nearby words, the human brain [...] continuously predict a hierarchy of representations that spans multiple timescales" 2023-03-19T18:59:53Z 2023-03-19 Bizarre Buildings sur Twitter : "This building in Guizhou, China." / Twitter 2023-03-19T18:44:38Z 2023-03-19 Lior⚡ sur Twitter : "Quick tip, you can use pip-chill instead of pip freeze to get the packages you are actually using." 2023-03-19T19:30:50Z 2023-03-19 2023-03-15 2023-03-15T02:14:03Z GPT-4 (OpenAI blog post) David Chalmers sur Twitter : "what are some new and interesting results about the relative capacities of multimodal models and pure language models... (thinking about "do language models need sensory grounding for meaning and understanding?".)" 2023-03-15 > the new GPT-4 data seem quite relevant here: the version with vision only slightly outperforms the language-only version on some standard tests. 2023-03-15T22:51:05Z 2023-03-13 2023-03-13T13:46:51Z Inria Paris NLP (ALMAnaCH team) sur Twitter : “Writing in two languages: Neural machine translation as an assistive bilingual writing tool” > Americans Tommy Albright (Gene Kelly) and Jeff Douglas (Van Johnson) are on a hunting trip in Scotland and become lost in the woodlands. They happen upon Brigadoon, a miraculously blessed village that rises out of the mists every hundred years for only a day. 2023-03-15T18:25:42Z Brigadoon (film) - Wikipedia 2023-03-15 [Big data? 🤗 Datasets to the rescue! - Hugging Face Course](doc:2023/03/big_data_🤗_datasets_to_the_re) 2023-03-12T12:14:56Z 2023-03-12 Support of very large dataset? - 🤗Datasets - Hugging Face Forums GPT-4 Technical Report 2023-03-15 2023-03-15T02:17:58Z 2023-03-22T20:21:16Z anton sur Twitter : "Since ChatGPT has recently lost the ability to maintain conversations I moved over to self-hosted chatbot-ui... Everything is saved locally." 2023-03-22 2023-03-14 2023-03-14T10:58:45Z Alpaca: A Strong Open-Source Instruction-Following Model Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT—Stephen Wolfram Writings 2023-03-01T23:10:30Z 2023-03-01 2023-03-24T00:13:21Z ChatGPT plugins > help ChatGPT access up-to-date information, run computations, or use third-party services wolfram among first plugins (remember [Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT—Stephen Wolfram Writings](doc:2023/03/wolfram%7Calpha_as_the_way_to_bri)) 2023-03-24 2023-03-05 2023-03-05T14:05:11Z Bibliothèque de MariaGambina ShareGPT: Share your wildest ChatGPT conversations with one click. 2023-03-04T23:37:45Z 2023-03-04 Jim Fan sur Twitter : "GPT-4 is HERE. Most important bits you need to know..." 2023-03-15 2023-03-15T02:07:30Z <https://twitter.com/DrJimFan/status/1635694095460102145?s=20> La Supplication : Tchernobyl, chronique du monde après l'apocalypse 2023-03-05 2023-03-05T13:50:19Z 2023-03-13T16:24:50Z 2023-03-13 Improve OCR quality for receipt processing with Tesseract and Label Studio Le Seigneur des porcheries — Wikipédia 2023-03-02T23:59:07Z 2023-03-02 Muhammad Ali - "I'm as pretty as a girl!" 2023-03-09T00:15:40Z 2023-03-09 > You look at me, I’m loaded with confidence, I can’t be beat! I had 180 amateur fights, 22 professional fights, and I’m pretty as a girl! 2023-03-05T12:58:38Z 2023-03-05 1177 B.C.: The Year Civilization Collapsed 2023-03-24 2023-03-24T00:04:56Z Welcome to LangChain Library aimed at assisting in the development of applcations that combine LLMs with other sources of computation or knowledge La fermeture de la Silicon Valley Bank marque la plus grosse faillite bancaire depuis la crise de 2008 2023-03-11 2023-03-11T09:56:37Z Introducing ChatGPT and Whisper APIs 2023-03-01 2023-03-01T21:20:01Z 2023-03-08T23:42:40Z 2023-03-08 Epsilons, no. 1: The geometric series - by Tivadar Danka > Sans l’aiguillon américain de l’Inflation Reduction Act (IRA), un gigantesque plan de subventions en lien avec la transition énergétique, les Vingt-Sept seraient encore à disserter sur les bienfaits de la concurrence libre et non faussée L’Europe enfin mûre pour une politique industrielle 2023-03-19T18:49:27Z 2023-03-19 John H. Meyer 🚀 sur Twitter : "@emerywells That's actually what I built it for👀 Context: I unfortunately lost my dad unexpectedly at the young age of 50, back in 2017. There was a lot left un-said, and a lot I wish I could've spoken to him about in my adult life.…" 2023-03-21T23:21:31Z 2023-03-21 Uses [LoRA: Low-Rank Adaptation of Large Language Models](doc:2023/03/2106_09685_lora_low_rank_ada) 2023-03-22 tloen/alpaca-lora: Instruct-tune LLaMA on consumer hardware 2023-03-22T00:23:50Z 2021-06-17T17:37:18Z Edward J. Hu Yelong Shen 2106.09685 Yuanzhi Li Shean Wang 2023-03-21 2021-10-16T18:40:34Z Edward J. Hu Phillip Wallis 2023-03-21T23:51:38Z Weizhu Chen LoRA: Low-Rank Adaptation of Large Language Models An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA. Zeyuan Allen-Zhu Lu Wang [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models Jim Fan sur Twitter : OpenAI just announced ChatGPT Plugins. If ChatGPT's debut was the "iPhone event", today is the "iOS App Store" event. openai/chatgpt-retrieval-plugin > a flexible solution for semantic search and retrieval of personal or organizational documents using natural language queries 2023-03-24T00:46:01Z 2023-03-24 > 3 official plugins available now: > > - Web browser: adding Bing in the loop > - Code interpreter: adding a live Python interpreter in a sandboxed & firewalled execution environment > - Retrieval: semantic search for your personal & organizational docs. [GitHub](doc:2023/03/openai_chatgpt_retrieval_plugin) 2023-03-24 2023-03-24T00:40:32Z « La préservation de l’unique lettre de Robespierre à Danton est une cause nationale » 2023-03-21T17:40:42Z 2023-03-21 2023-03-12 2023-03-12T23:43:03Z Six Nations 2023 : Le résumé d'Angleterre vs France - YouTube 2023-03-16 2023-03-16T23:02:01Z Jordan Jacobs sur Twitter : "AI is eating software. Why? Traditional software never improves. AI enables ‘smart’ software to learn & improve constantly. The world runs on software and AI is changing everything. Yet few people see or understand the massive AI wave on the horizon." / Twitter 2023-03-12T11:07:28Z 2023-03-12 « Pour la première fois peut-être, des avocats et des adversaires du nucléaire tentent de préserver ce qui les rassemble » La Fin de l'homme rouge 2023-03-05 2023-03-05T13:53:03Z 2023-03-12 2023-03-12T01:44:16Z Défenestration de Prague (1618) — Wikipédia 2023-03-20 2023-03-20T11:27:16Z LLM Zoo at Home: LLaMA & Alpaca | bergis universe of software, hardware and ideas 2023-03-19T14:50:11Z 2023-03-19 Andrej Karpathy sur Twitter : "Base LLMs (non-finetuned) make very strong few-shot classifiers. Describe task in English, give few examples, read off the label probabilities on test example. No gradient-based optimization necessary. It brings a cannon to a knife fight but is fast, convenient, strong baseline." / Twitter > they all seem to be similar in key aspects: > 1. combining language & vision embeddings using (mostly) pre-trained backbones. > 2. training composite Transformer model using next token prediction." / Twitter Leo Boytsov sur Twitter : "🧵A lot of multi-modal models were announced recently. A little summary..." 2023-03-08 2023-03-08T23:54:33Z 2023-03-20 2023-03-20T11:05:25Z JB Rubinovitz sur Twitter : "The "Will GPT automate all the jobs?" paper is out With participation from @OpenAI, OpenResearch and @penn..." 2023-03-08 Sergey Levine sur Twitter : "What if we train a language model on images & robot data? That's the idea behind PaLM-E: a huge LLM (562B params) that is trained on language and "multimodal sentences" that include images and language" 2023-03-08T01:34:32Z raphaelsrty sur Twitter : "I made Knowledge, an open-source tool the automatically index content I interact with on Github, Twitter, HackerNews and Zotero. It will make my life easier to retrieve and share documents I like..." 2023-03-15 2023-03-15T00:59:45Z ChatGPT : la philosophie du baratin - YouTube 2023-03-06T14:30:23Z 2023-03-06 Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length. 2023-03-17T03:28:17Z Sumit Sanghai Tao Lei 2023-03-17T03:28:17Z David Uthus CoLT5: Faster Long-Range Transformers with Conditional Computation Siddhartha Brahma 2023-03-20 Joshua Ainslie Yury Zemlyanskiy 2023-03-20T07:52:59Z Michiel de Jong Yi Tay Joshua Ainslie Santiago Ontañón Yun-Hsuan Sung Mandy Guo 2303.09752 [2303.09752] CoLT5: Faster Long-Range Transformers with Conditional Computation James Lee-Thorp Remote to a VM over an IAP tunnel with VSCode | by Albert Brand | Medium 2023-03-23T11:41:27Z 2023-03-23 GPT-4 Developer Livestream - YouTube 2023-03-15 2023-03-15T01:32:11Z > it's not perfect, neither are you. Together... Hugging Face sur Twitter : "We're releasing JS libraries to interact with Hugging Face..." 2023-03-03T20:49:49Z 2023-03-03 2023-03-09 2023-03-09T00:24:16Z « Le Cavalier de la nuit » : Robert Penn Warren, l’œil perçant du Sud 2023-03-15 raphaelsty/knowledge: Open-source personal bookmarks search engine 2023-03-15T01:02:34Z 2023-03-12T02:21:02Z 2023-03-12 Judas de Amos Oz Jim Fan sur Twitter : "I don't give a damn about what is or isn't AGI. It doesn't matter. Below is GPT-4's performance on many standardized exams: BAR, LSAT, GRE, AP, etc. The truth is, GPT-4 can apply to Stanford as a student now. AI's reasoning ability is OFF THE CHARTS. Exponential growth is the… https://t.co/on8XKqOazg" / Twitter 2023-03-16 2023-03-16T10:20:17Z Meta AI sur Twitter : "New in Nature Human Behavior, Meta AI researchers show how current language models differ from the human brain & highlight the role of long-range & hierarchical predictions..." 2023-03-03 [Evidence of a predictive coding hierarchy in the human brain listening to speech | Nature Human Behaviour](doc:2023/03/evidence_of_a_predictive_coding) 2023-03-03T22:18:56Z BioBootloader sur Twitter : "Today I used GPT-4 to make "Wolverine" - it gives your python scripts regenerative healing abilities! Run your scripts with it and when they crash, GPT-4 edits them and explains what went wrong" 2023-03-18T08:54:37Z 2023-03-18 <https://github.com/biobootloader/wolverine> > When asked why OpenAI changed its approach to sharing its research, Sutskever replied simply, “We were wrong. Flat out, we were wrong. If you believe, as we do, that at some point, AI — AGI — is going to be extremely, unbelievably potent, then it just does not make sense to open-source. It is a bad idea... I fully expect that in a few years it’s going to be completely obvious to everyone that open-sourcing AI is just not wise.” 2023-03-16T19:06:46Z OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’ - The Verge 2023-03-16 2023-03-24 2023-03-24T01:29:14Z Lior⚡ sur Twitter : "OpenAI just announced a new experimental model that knows when and how to browse the internet within ChatGPT! " ??? Nine ChatGPT Tricks for Knowledge Graph Workers - The Cagle Report 2023-03-18 2023-03-18T13:43:14Z 2023-03-21 Brian Roemmele sur Twitter : "I am very excited to announce I have been successful in installing and operating a full ChatGPT knowledge set and interface fully trained on my local computer and it needs no Internet once installed." 2023-03-21T07:54:49Z Roman d'aventures de Joseph Kessel, consacré à l'Afghanistan et au jeu du bouzkachi 2023-03-10 Les Cavaliers (Kessel) 2023-03-10T02:37:55Z PyTorch on Google Cloud: How to deploy PyTorch models on Vertex AI | Google Cloud Blog 2023-03-23T11:42:32Z 2023-03-23 xenova/transformers.js: Run 🤗 Transformers in your browser! 2023-03-08T23:31:37Z 2023-03-08 > Tous les plastiques, même recyclés, finiront en déchet 2023-03-12T02:55:48Z 2023-03-12 Contre le tout-plastique, le combat de la chercheuse Nathalie Gontard 2023-03-14 A full training - Hugging Face Course 2023-03-14T02:50:07Z Big data? 🤗 Datasets to the rescue! - Hugging Face Course 2023-03-12 2023-03-12T12:18:42Z > A simple way to measure memory usage in Python is with the psutil library (#[Python tips](tag:python_tips)) Note: to create a dataset from, eg., a csv files, you don't need to read the file: <https://huggingface.co/docs/datasets/v1.12.0/loading.html#csv> 2023-03-12 2023-03-12T23:24:35Z De Londres à Tahiti, le mystère de la statue du dieu A’a > L’une des plus belles œuvres de l’art primitif, sculptée dans la petite île polynésienne de Rurutu, est de retour depuis le 4 mars au Musée de Tahiti grâce à un prêt du [British Museum](tag:british_museum). > Son histoire est des plus curieuses. Au début du XIXe, des navires occidentaux croisent dans ces îles et laissent derrière eux des bactéries qui déciment les populations indigènes. Les quelque 6 000 habitants de Rurutu ne sont plus que 200 en 1820 et commencent à croire que les dieux les punissent, ou pire, qu’ils sont peut-être inefficaces. Une partie d’entre eux prend la mer vers Tubuai, une île « voisine » du sud (216 km) ; vingt-cinq autres, conduits par le chef Au’ura prennent la mer au nord, et dérivent jusqu’à Maupiti, aux îles Sous-le-Vent, à plus de 650 kilomètres de là. Rewon Child Ilya Sutskever 2023-03-19 Generating Long Sequences with Sparse Transformers 1904.10509 Alec Radford Rewon Child Scott Gray 2019-04-23T19:29:47Z Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a variation on architecture and initialization to train deeper networks, b) the recomputation of attention matrices to save memory, and c) fast attention kernels for training. We call networks with these changes Sparse Transformers, and show they can model sequences tens of thousands of timesteps long using hundreds of layers. We use the same architecture to model images, audio, and text from raw bytes, setting a new state of the art for density modeling of Enwik8, CIFAR-10, and ImageNet-64. We generate unconditional samples that demonstrate global coherence and great diversity, and show it is possible in principle to use self-attention to model sequences of length one million or more. [1904.10509] Generating Long Sequences with Sparse Transformers 2019-04-23T19:29:47Z 2023-03-19T15:27:06Z 2021-04-15T00:53:54Z COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List 2023-03-08T17:46:59Z Luyu Gao 2104.07186 2021-04-15T00:53:54Z Zhuyun Dai Jamie Callan [2104.07186] COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List Luyu Gao 2023-03-08 Classical information retrieval systems such as BM25 rely on exact lexical match and carry out search efficiently with inverted list index. Recent neural IR models shifts towards soft semantic matching all query document terms, but they lose the computation efficiency of exact match systems. This paper presents COIL, a contextualized exact match retrieval architecture that brings semantic lexical matching. COIL scoring is based on overlapping query document tokens' contextualized representations. The new architecture stores contextualized token representations in inverted lists, bringing together the efficiency of exact match and the representation power of deep language models. Our experimental results show COIL outperforms classical lexical retrievers and state-of-the-art deep LM retrievers with similar or smaller latency. Cinq lieux à voir en Normandie 2023-03-12 2023-03-12T10:57:34Z How to use LayoutParser library to detect the layout and extract texts from document images Analyzing Document Layout with LayoutParser | by Ruben Winastwan | Towards Data Science 2023-03-21T17:34:27Z 2023-03-21 2023-03-18T09:17:49Z 2023-03-18 Using AI to make teaching easier & more impactful