]> eg. > We create a node parser that extracts the document title and hypothetical question embeddings relevant to the document chunk. Extracting Metadata for Better Document Indexing and Understanding - LlamaIndex 🦙 0.7.4 2023-07-10 2023-07-10T12:29:29Z 2023-07-10T23:55:36Z 2023-07-10 Le rapport de la Ligue des droits de l’homme après la manifestation de Sainte Soline : « Il est urgent d’envisager une désescalade » > Selon le président de la LDH les affrontements à Sainte Soline, le 25 mars, sont emblématiques « en termes de violences policières » d’« un usage disproportionné et indifférencié de la force » **5000 grenades en 3 heures !** (1 toutes les 2 secondes) SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval 2023-07-26T23:28:40Z 2109.10086 Thibault Formal Thibault Formal Stéphane Clinchant 2023-07-26 2021-09-21T10:43:42Z cf. [[2107.05720] SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking](doc:2023/05/2107_05720_splade_sparse_lex) [2109.10086] SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval Carlos Lassance 2021-09-21T10:43:42Z Benjamin Piwowarski In neural Information Retrieval (IR), ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning \emph{sparse} representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. Introduced recently, the SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches. In this paper, we build on SPLADE and propose several significant improvements in terms of effectiveness and/or efficiency. More specifically, we modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation. We also report results on the BEIR benchmark. Overall, SPLADE is considerably improved with more than $9$\% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark. Breaking barriers with OpenBB and LlamaIndex: simplifying data access to 100+ trusted sources | OpenBB > As LLMs gain traction in finance, OpenBB takes a unique path, using LlamaIndex to map natural language, allowing newcomers to easily use 900+ commands and access 100+ sources. > Rather than index financial data directly with a vector store, they used @llama_index to index their commands. > These commands are fetched during query-time, creating a natural language layer over their rich query system. [Jerry Liu sur Twitter](doc:2023/07/jerry_liu_sur_twitter_if_you_1) 2023-07-20 2023-07-20T23:09:54Z Jerry Liu sur Twitter : "if you have access to a rich query language (e.g. SQL / any DSL), use a vector db to index additional metadata to help the LLM execute queries using this query language, while preventing prompt overflows! 2023-07-20 2023-07-20T23:14:16Z Personal Knowledge Graphs - strategic structures 2023-07-04 2023-07-04T01:23:08Z > MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking, Keyphrase Extraction, and Conversational Search Studies, or what the community thinks would be useful. 1 million unique real queries that were generated by sampling and anonymizing [Bing](tag:bing) usage logs. 2023-07-14T10:28:08Z 2023-07-14 MSMARCO | MSMARCO-Question-Answering > Your AI assistant to discover and understand research papers SciSpace Literature Review - Get to the bottom of scientific literature 2023-07-03T07:48:00Z 2023-07-03 > DON’T just use out of the box RAG (e.g. default VectorStoreIndex in @llama_index, RetrieverQAChain in langchain,… 2023-07-16 2023-07-16T22:27:06Z Jerry Liu sur Twitter : Hot take: if you want to... deliver technical differentiation, you will need to learn LLM development in a “bottoms-up” fashion 2023-07-14 Vaiva Imbrasaite Chitta Baral 2023-05-23T14:55:25Z Man Luo 2305.14128 > While early studies primarily used a fixed or random set of demonstrations for all test queries, recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance. This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations. Dr.ICL: Demonstration-Retrieved In-context Learning Zhuyun Dai Vincent Y Zhao Man Luo [2305.14128] Dr.ICL: Demonstration-Retrieved In-context Learning Mehran Kazemi Panupong Pasupat In-context learning (ICL), teaching a large language model (LLM) to perform a task with few-shot demonstrations rather than adjusting the model parameters, has emerged as a strong paradigm for using LLMs. While early studies primarily used a fixed or random set of demonstrations for all test queries, recent research suggests that retrieving semantically similar demonstrations to the input from a pool of available demonstrations results in better performance. This work expands the applicability of retrieval-based ICL approaches by demonstrating that even simple word-overlap similarity measures such as BM25 outperform randomly selected demonstrations. Furthermore, we extend the success of retrieval-based ICL to instruction-finetuned LLMs as well as Chain-of-Thought (CoT) prompting. For instruction-finetuned LLMs, we find that although a model has already seen the training data at training time, retrieving demonstrations from the training data at test time yields better results compared to using no demonstrations or random demonstrations. Last but not least, we train a task-specific demonstration retriever that outperforms off-the-shelf retrievers. 2023-07-14T12:25:23Z Xin Xu 2023-05-23T14:55:25Z 2023-07-07T00:00:51Z > Excited about implication for continual learning, interpretability etc. 2023-07-07 Sanjeev Arora sur Twitter : "new `skills' induced by LLM fine-tuning can be localized in tiny fraction of the model." Customizing Agent to Chat with Your Documents | Haystack 2023-07-25T20:50:24Z 2023-07-25 2023-07-23T00:10:52Z > Using naive RAG techniques (naive text chunking, simple top-k retrieval -> LLM) is fine for hackathons, but will lead to lots of failure cases. [slides](https://docs.google.com/presentation/d/1wTEt3sy7ZHk3rYO3nFYhPZEFrfpG70l6WzY12wIaycE/edit#slide=id.p) among the points: - Good parser - Augmenting chunks with context. Eg. keeping page num with chunk allows for inline citation - Right indexes over your data - Using LLMs for Automatic Metadata Extraction Jerry Liu sur Twitter : "Some critical data considerations that you must take into account to make your LLM application production-ready" 2023-07-23 Llama 2 - Meta AI 2023-07-19T01:43:52Z 2023-07-19 Le Haut-Commissariat des Nations unies aux droits de l’homme épingle la France pour les « profonds problèmes de racisme et de discrimination raciale parmi les forces de l’ordre » 2023-07-01 2023-07-01T10:42:27Z la porte-parole du Haut-Commissariat des Nations unies aux droits de l’homme : > « C’est le moment pour le pays de s’attaquer sérieusement aux profonds problèmes de racisme et de discrimination raciale parmi les forces de l’ordre » > > « Nous comprenons qu’il y a eu beaucoup de pillages et de violences, par certains éléments qui utilisent les manifestations à ces fins, et qu’il y a eu un grand nombre de policiers qui ont également été blessés », ... **Mais « nous appelons les autorités à s’assurer que l’usage de la force par la police pour s’en prendre aux éléments violents lors des manifestations respecte les principes de légalité, de nécessité, de proportionnalité, de non-discrimination, de précaution et de responsabilité »** 2023-07-05T22:51:46Z 2023-07-05 Sebastian Roché : « Les mauvaises pratiques policières sapent les fondements de la République » Comment des « gènes sauteurs » sont passés d’une espèce animale à une autre 2023-07-12 2023-07-12T07:54:40Z 2023-07-14T01:59:01Z 2023-07-14 LlamaIndex sur Twitter : "Stop building API connectors - build data agents that can automatically access to ANY API defined with an OpenAPI spec..." Gradio sur Twitter : "build a Chatbot UI in Python -- including streaming, undo/retry, API, all out of the box!..." 2023-07-18 2023-07-18T00:13:28Z Tom Toro "Yes, the planet got destroyed. But for a beautiful moment in time we created a lot of value for shareholders." 2023-07-02 2023-07-02T20:06:55Z > film is inspired by the true story of a 1962 tour of the Deep South by African American pianist Don Shirley and Italian American bouncer and later actor Frank "Tony Lip" Vallelonga, who served as Shirley's driver and bodyguard 2023-07-10T23:09:48Z 2023-07-10 Green Book (film) > - Supports state-of-the-art models, including LLMs like Falcon & LLaMA > - 4-bit & 8-bit inference > - Built from composable, reusable components 2023-07-14T02:11:49Z 2023-07-14 spaCy sur Twitter : "NEW transformer library for PyTorch: curated-transformers!" 2023-07-21T00:05:58Z 2023-07-21 > using a LoRA script to fine-tune a [intfloat/e5-large-v2](tag:e5) model on the smangrul/amazon_esci dataset (query, product_title, relevance_label) for semantic similarity tasks LoRA for semantic similarity tasks Santiago sur Twitter : " Deep TDA, a new algorithm using self-supervised learning, overcomes the limitations of traditional dimensionality reduction algorithms (t-SNE and UMAP)..." 2023-07-06T23:56:25Z 2023-07-06 > [Jacques Attali ](tag:attali)anticipe la disparition de l’humanité sous les coups de boutoir de **« l’économie de la mort », celle des énergies fossiles, de l’agriculture « désastreuse », de la drogue, de la tyrannie du court terme et du moi d’abord**. Pour lui, elle représente 60 % de notre production actuelle de richesse mesurée par le PIB. Pour éviter la catastrophe annoncée, il propose un basculement massif vers « l’économie de la vie », centrée sur l’éducation, la santé, la mobilité et l’énergie durable, l’alimentation saine, la culture, la démocratie. > Modifier la Constitution en spécifiant que toute décision contraire à l’intérêt des générations futures serait déclarée inconstitutionnelle 2023-07-10T21:09:28Z 2023-07-10 Aux Rencontres économiques d’Aix-en-Provence, les tensions d’un monde qui bascule Andrej Karpathy sur Twitter : "My fun weekend hack: llama2.c Lets you train a baby Llama 2 model in PyTorch, then inference it with one 500-line file with no dependencies, in pure C." 2023-07-24 2023-07-24T08:37:06Z 2023-07-10 2023-07-10T07:56:23Z LlamaIndex 0.7.0: Better Enabling Bottoms-Up LLM Application Development | by Jerry Liu | LlamaIndex Blog | Jul, 2023 | Medium 2023-07-26T23:36:33Z 2023-07-26 retrieval model that learns sparse lexical representations with contextual embeddings > we **combine the strengths of both the sparse and dense representations** for first-stage retrieval. > > Compared with [SPLADE](tag:splade), our model leverages the contextual embeddings to improve model expressiveness. Compared with [ColBERT](tag:colbert), our sparse representations are trained end-to-end to optimize both efficiency and effectiveness. SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval 2023-07-09 2023-07-09T10:31:17Z LlamaIndex: Unleash the power of LLMs over your data | Hacker News ML Blog - Improve ChatGPT with Knowledge Graphs 2023-07-04T22:47:22Z 2023-07-04 2023-07-07 > Native text splitting + top-k on your tables == bad results! > A nuanced, hierarchical data representation over your PDF can help 2023-07-07T00:32:21Z Jerry Liu sur Twitter : "If you’re building “chat over your PDFs” with LLMs, you need to deal with the pesky issue of how to parse embedded tables/diagrams..." ChatGPT and Elasticsearch: A plugin to use ChatGPT with your Elastic data | Elastic Blog 2023-07-07T17:59:56Z 2023-07-07 Un gisement géant d’hydrogène en Lorraine ? | CNRS Le journal 2023-07-07T00:25:51Z 2023-07-07 Jerry Liu sur Twitter : "Adding metadata to text can help w/ disambiguation and boost retrieval performance for LLM QA systems, using LLMs to... extract rich context to augment each chunk" 2023-07-09T10:07:29Z 2023-07-09 LongNet: Scaling Transformers to 1,000,000,000 Tokens 2023-07-05T17:59:38Z Xingxing Zhang Shaohan Huang 2023-07-05T17:59:38Z Shuming Ma Jiayu Ding [2307.02486] LongNet: Scaling Transformers to 1,000,000,000 Tokens Wenhui Wang 2307.02486 Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. In this work, we introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, we propose dilated attention, which expands the attentive field exponentially as the distance grows. LongNet has significant advantages: 1) it has a linear computation complexity and a logarithm dependency between tokens; 2) it can be served as a distributed trainer for extremely long sequences; 3) its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization. Experiments results demonstrate that LongNet yields strong performance on both long-sequence modeling and general language tasks. Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence. Furu Wei 2023-07-06T23:49:37Z Jiayu Ding Li Dong 2023-07-06 Generating labeled data via instruction-prompting Large Language Models to train ranking models > The approach uses a handful of human-annotated labeled examples (few-shot) and prompts the LLM to generate synthetic queries for documents in the corpus. Improving Search Ranking with Few-Shot Prompting of LLMs | Vespa Blog 2023-07-07T20:29:55Z 2023-07-07 Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI - Sebastian Raschka 2023-07-27 2023-07-27T01:54:57Z > how to tune an LLM with Low-Rank Adaptation (LoRA) in a computationally efficient manner [tweet](https://twitter.com/rasbt/status/1651226178353614854) [Karpathy](https://twitter.com/karpathy/status/1651288867247640578) > the paper LoRA: Low-Rank Adaptation of Large Language Models proposes to decompose the weight changes, ΔW, into a lower-rank representation. (To be technically correct, LoRA does not decompose the matrices directly, but it learns the decomposed matrices via backpropagation). > > suppose ΔW is the weight update for an A × B weight matrix. Then, we can decompose the weight update matrix into two smaller matrices: ΔW = WA WB, where WA is an an A × r-dimensional matrix, and WB is an an r × B-dimensional matrix. LORA and LLama : > Lit-LLaMA repository a simple, readable reimplementation of Meta’s popular LLaMA model. Besides code for training and running LLaMA itself (with the original Meta LLaMA weights), it also contains code for finetuning LLaMA using LLaMA-Adapter and LoRA. 2023-07-02 2023-07-02T00:11:36Z La mission Euclid part sonder le côté obscur de l’Univers Jeremy Howard sur Twitter : "regulation designed to increase AI safety may backfire badly!" 2023-07-11T23:22:14Z 2023-07-11 2023-07-04 2023-07-04T22:49:59Z LlamaIndex 0.7.0: Better Enabling Bottoms-Up LLM Application Development | by Jerry Liu | Jul, 2023 | Medium 2023-07-18T00:17:24Z > les magistrats estiment que « la modification des pratiques voire la réforme du modèle agricole apparaissent comme une nécessité ». > Le rapport se montre également très réservé sur les projets de mégabassines destinés à l’irrigation agricole soutenus par le gouvernement > La Cour des comptes recommande de « conditionner le financement public » des infrastructures d’irrigation de terres agricoles à des engagements de réduction des quantités d’eau utilisée 2023-07-18 Face à la raréfaction de la ressource en eau, l’« unique solution » est de « réduire les prélèvements », estime la Cour des comptes PromptHub 2023-07-07 2023-07-07T00:12:08Z 2023-07-17T07:54:02Z 2023-07-17 « Il y a des violences que l’Etat affronte et d’autres auxquelles il consent » 2023-07-27 Dans le détroit de Gibraltar, la vendetta de l’orque Gladis 2023-07-27T12:04:40Z Jerry Liu sur Twitter : "The `camelot` package is an awesome module for extracting tables from PDFs..." 2023-07-03T07:43:02Z 2023-07-03 Jack Rae sur Twitter : "Pretty wild that simple text compression algorithms demonstrate few-shot learning." 2023-07-14T01:36:20Z 2023-07-14 papers says that gzip + knn is better at similarity search than embeddings for out-of-domain data. [Yoav Goldberg](tag:yoav_goldberg)'s [tweet](https://twitter.com/yoavgo/status/1679669236082388992) > Gzip does *not* produce an embedding. The gzip paper only defines a distance measure (not a metric) for two strings. Distance measures are great for building nonparametric learners. Which is what the paper does. [@deliprao](https://twitter.com/deliprao/status/1679851151074705409?s=20) 2023-07-27T01:50:25Z 2023-07-27 What is low-rank adaptation (LoRA)? - TechTalks 2023-07-20 2023-07-20T08:33:39Z > find counterfactual statements in customer reviews from 8 example: > - Fine-tuning: 13% accuracy > - Embedding based: 61% accuracy for classif: nearest neighbour < nearest centroïd < logistic regression classifier: > lightweight logistic regression classifier is the fastest and best method, especially with more training data. [Unlocking the Power of Cross-Lingual Classification in NLP](doc:2023/07/unlocking_the_power_of_cross_li) Nils Reimers sur Twitter : "Cross-Lingual Text-Classification just from English Data" Unlocking the Power of Cross-Lingual Classification in NLP 2023-07-20 2023-07-20T08:41:06Z LlamaIndex 🦙 (GPT Index) sur Twitter : "Pretty much everyone building LLM apps over data has to figure out how to... cram arbitrary data into limited context windows?" > Our 0.7.0 response synthesis modules eliminate the need to write this boilerplate. Here’s an overview of strategies > with 0.7.0 they’re standalone modules, so you can use them with OR without the rest of LlamaIndex! 2023-07-09 2023-07-09T10:47:08Z clem 🤗 sur Twitter : "Llama 2 by @Meta is already integrated with @huggingface transformers, TGI, inference endpoints, PEFT and much more..." 2023-07-19T02:06:00Z 2023-07-19 Jerry Liu sur Twitter : "Using cross-encoding as a reranking step can dramatically speed up LLM inference time AND improve accuracy!" 2023-07-20T08:24:24Z (speedup inference, because you can pass less nodes to the context) > We use an [MSMarco SBERT cross-encoder from @huggingface](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html) ``` from sentence_transformers import CrossEncoder model = CrossEncoder('model_name', max_length=512) scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')]) ``` (cf. https://www.sbert.net/docs/pretrained-models/ce-msmarco.html 2023-07-20 Euclid, l’énergie noire en ligne de mire | CNRS Le journal 2023-07-01 2023-07-01T15:38:04Z Li Dong 2023-07-20T23:43:53Z 2023-07-20 Yutao Sun 2307.08621 Shaohan Huang Jianyong Wang Shuming Ma In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet. Yutao Sun [2307.08621] Retentive Network: A Successor to Transformer for Large Language Models Yuqing Xia Retentive Network: A Successor to Transformer for Large Language Models Furu Wei 2023-07-17T16:40:01Z 2023-07-19T05:56:42Z Jilong Xue 2023-07-25T14:00:57Z 2023-07-25 Welcome to Looker Studio! - Looker Studio Help > embeddings may fail to capture the importance of individual words 2023-07-01T08:04:35Z 2023-07-01 Scott Condron sur Twitter : "Embedding-based retrieval alone might be insufficient"... Andrej Karpathy sur Twitter : "Promising. Everyone should hope that we can throw away tokenization in LLMs..." 2023-07-01 2023-07-01T09:09:04Z [[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers](doc:2023/07/2305_07185_megabyte_predicti) MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Mike Lewis Armen Aghajanyan Lili Yu 2023-07-01 Lili Yu 2305.07185 Dániel Simig 2023-05-19T21:09:11Z 2023-05-12T00:55:41Z > these results establish the viability of tokenization-free autoregressive sequence modeling at scale Colin Flaherty 2023-07-01T09:10:39Z [2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Luke Zettlemoyer Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a global model between patches. This enables sub-quadratic self-attention, much larger feedforward layers for the same compute, and improved parallelism during decoding -- unlocking better performance at reduced cost for both training and generation. Extensive experiments show that Megabyte allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files. Together, these results establish the viability of tokenization-free autoregressive sequence modeling at scale. 2023-07-03T23:34:07Z 2023-07-03 How to wrap without using tape - YouTube 2023-07-06T23:43:50Z 2023-07-06 Les leçons d’émeutes urbaines sans précédent : une crise sécuritaire, sociale, politique et éducative Llama 2 is here - get it on Hugging Face 2023-07-19 2023-07-19T02:13:41Z 2023-07-03T23:23:42Z 2023-07-03 Jerry Liu sur Twitter : "LLMs can directly extract structured data (esp w/ Function API), but can be slow/expensive. 🤔 Instead: use LLMs to generate code, run code to extract data..."