Semanlink : new entries
http://www.semanlink.net/rss/
Semanlink : new entriesRAG makes LLMs better and equal | Pinecone
http://www.semanlink.net/doc/2024/03/rag_makes_llms_better_and_equal
> The study demonstrates that RAG significantly improves LLM performance, **even on questions within their training domain**.
> RAG could enable smaller, less costly, or private models to
deliver high-quality results in tasks requiring simple
factual reasoning
2024-03-13T22:49:36ZCommand-R: RAG at Production Scale
http://www.semanlink.net/doc/2024/03/command_r_rag_at_production_sc
2024-03-13T23:27:05ZNils Reimers sur X : "Embeddings can store only 1 aspect/topic per embedding well."
http://www.semanlink.net/doc/2024/03/nils_reimers_sur_x_smlpth_w
> On Wikipedia, one paragraph typically focuses on one topic. So this gives you a good chunking for Wikipeda
2024-03-13T23:20:09ZOn a testé Le Chat, l’étonnant ChatGPT à la française de Mistral AI
http://www.semanlink.net/doc/2024/03/on_a_teste_le_chat_l%E2%80%99etonnant_
2024-03-12T08:10:20ZAux Pays-Bas, l’éventuelle délocalisation du géant ASML agite les autorités politiques
http://www.semanlink.net/doc/2024/03/aux_pays_bas_l%E2%80%99eventuelle_delo
**Le groupe de haute technologie dénonce une politique de plus en plus restrictive en matière d’immigration** et un environnement pas assez « probusiness ».
2024-03-12T22:46:41ZWhat you should know about RAG (from beginner to advanced) | by Jonathan Nguyen | Medium
http://www.semanlink.net/doc/2024/03/what_you_should_know_about_rag_
2024-03-11T10:09:22ZRAG CLI - LlamaIndex
http://www.semanlink.net/doc/2024/03/rag_cli_llamaindex
CLI tool to ingest local files into a local vector database that is then used for a Chat Q&A repl within your terminal.
2024-03-10T11:25:15ZHrishi sur X : "RAPTOR is... one of the very few [RAG architectures] that actively presumes and uses the structure in a document...."
http://www.semanlink.net/doc/2024/03/hrishi_sur_x_bookmarked_pape
(thread by the person of [WalkingRAG](tag:walkingrag))
> The similarities between WalkingRAG and RAPTOR are that both attempt to capture relationships in the data into a higher structure using LLMs... This is a tree in RAPTOR's case, with WalkingRAG it's a graph.
2024-03-09T11:30:15ZAnswer.AI - You can now train a 70b language model at home
http://www.semanlink.net/doc/2024/03/answer_ai_you_can_now_train_a
2024-03-09T10:06:03ZAkshay 🚀 sur X : "Let's build a "Chat with your code" RAG application, step-by-step"
http://www.semanlink.net/doc/2024/03/akshay_%F0%9F%9A%80_sur_x_let_s_build_
2024-03-09T11:55:54ZKrista Opsahl-Ong sur X : "Got a pipeline with **multiple prompts**, like a DSPy program? ... Introducing MIPRO, a Multi-prompt Instruction Proposal Optimizer...."
http://www.semanlink.net/doc/2024/03/krista_opsahl_ong_sur_x_got_
2024-03-09T11:37:47ZHrishi sur X : "WalkingRAG is finally out!..."
http://www.semanlink.net/doc/2024/03/hrishi_sur_x_walkingrag_is_f
2024-03-09T11:28:51ZColBERT gist:c1182551fa609736d47df4af82f7c5ab
http://www.semanlink.net/doc/2024/03/colbert_gist_c1182551fa609736d4
> a quick gist that does synthetic data gen, fine-tuning, eval. Just add your own documents, or try it on a PG essay.
@JoshPurtell
2024-03-08T23:31:23ZUnderstanding Hierarchical Navigable Small Worlds (HNSW)
http://www.semanlink.net/doc/2024/03/understanding_hierarchical_navi
2024-03-08T12:33:18ZColBERT Inference in the Browser
http://www.semanlink.net/doc/2024/03/colbert_inference_in_the_browser
Demo of ColBERT query-passage scoring interpretability
- try with the following: "what are the mentioned EICPS?" and passage "There is a security risk related to EICPS 67"
- MaxSim Score: 20.71
- Estimated Relevance: 64.71%
- highlights: There related
- then "what are the mentioned animals?" and "There is a security risk related to lions"
- MaxSim Score: 9.18
- Estimated Relevance: 28.68%
- highlights: related lions
```
Effects of climate change on marine ecosystems
MaxSim Score: 27.90
Estimated Relevance: 87.17%
Effects of global warming on marine ecosystems
MaxSim Score: 24.62
Estimated Relevance: 76.94%
Effects of global warming on life in the oceans
MaxSim Score: 19.64
Estimated Relevance: 61.39%
Effects of global warming on life on Mars
MaxSim Score: 13.65
Estimated Relevance: 42.65%
```
2024-03-08T18:07:53ZTopics | IBM Research
http://www.semanlink.net/doc/2024/03/topics_%7C_ibm_research
2024-03-07T16:10:39Zhuggingface/text-clustering: Easily embed, cluster and semantically label text datasets
http://www.semanlink.net/doc/2024/03/huggingface_text_clustering_ea
tools to easily embed and cluster texts as well as label clusters semantically
2024-03-07T13:04:38Zrabbit - research
http://www.semanlink.net/doc/2024/03/rabbit_research
learning human actions on computer applications
2024-03-07T16:02:46ZKGC23 Keynote: The Future of Knowledge Graphs in a World of LLMs — Denny Vrandečić, Wikimedia - YouTube
http://www.semanlink.net/doc/2024/03/kgc23_keynote_the_future_of_kn
2024-03-07T15:38:33ZGraphRAG: Unlocking LLM discovery on narrative private data - Microsoft Research
http://www.semanlink.net/doc/2024/03/graphrag_unlocking_llm_discove
> GraphRAG uses **LLM generated
knowledge graphs** to provide substantial
improvements in question-and-answer performance when
conducting document analysis of complex information.
> power of **prompt augmentation** when performing
**discovery** on private datasets (data that the LLM is not trained on and has
never seen before, such as an enterprise’s proprietary
research, business documents..)
> GraphRAG uses the LLM to **create a knowledge graph
based on the private dataset**. This graph is then used
alongside graph machine learning to perform **prompt
augmentation** at query time.
> the GraphRAG approach [can] **discover entities in the query**. This allows the LLM to
ground itself in the graph and results in superior answer
that contains provenance through links to the original
supporting text
GraphRAG can answer queries such as "**what are the top five themes in the data?**"
2024-03-07T14:12:15ZYrjänä Rankka 🌻 sur X : "@ParisHilton You are a disambiguation benchmark for semantic reasoning." / X (2018)
http://www.semanlink.net/doc/2024/03/yrjana_rankka_%F0%9F%8C%BB_sur_x_pari
> Tell me something I don't know...
2024-03-07T01:35:13ZRisques liés aux « nouveaux OGM » : l’Anses recommande une évaluation au cas par cas, dans un avis resté confidentiel
http://www.semanlink.net/doc/2024/03/risques_lies_aux_%C2%AB_nouveaux_ogm
2024-03-05T22:38:53ZComment Opendatasoft est devenue l’acteur incontournable de l’ouverture des données publiques
http://www.semanlink.net/doc/2024/03/comment_opendatasoft_est_devenu
2024-03-05T14:04:14ZMexico 1968 Official Film | The Olympics in Mexico
http://www.semanlink.net/doc/2024/03/mexico_1968_official_film_%7C_the
2024-03-04T08:15:12ZAnnouncing Vespa Long-Context ColBERT | Vespa Blog
http://www.semanlink.net/doc/2024/03/announcing_vespa_long_context_c
2024-03-03T09:01:52ZHow to Build a RAG System With LlamaIndex, OpenAI, and MongoDB Vector Database | MongoDB
http://www.semanlink.net/doc/2024/03/how_to_build_a_rag_system_with_
2024-03-03T10:21:00ZJerry Liu sur X : "To better augment LLMs with context, it makes a lot of sense to organize context not just as a flat list of text chunks, but as a hierarchy of high-level to low-level details. RAPTOR..."
http://www.semanlink.net/doc/2024/03/jerry_liu_sur_x_to_better_au
> To better augment LLMs with context, it makes a lot of sense to organize context not just as a flat list of text chunks, but as a hierarchy of high-level to low-level details.
>
> RAPTOR is a super simple but neat idea towards this direction. Hierarchically cluster and summarize the text into a tree (the clustering is important, allows semantically related concepts to be grouped together and doesn't purely rely on spatial positioning!). During query-time dynamically retrieve the most relevant context to the question.
2024-03-03T10:14:19ZOn the Surprising Behavior of Distance Metrics in High Dimensional Space (Aggarwal 2001)
http://www.semanlink.net/doc/2024/03/on_the_surprising_behavior_of_d
> in high dimensional space, the concept of proximity, distance
or nearest neighbor may not even be qualitatively meaningful.
2024-03-03T21:33:43ZNaomi Oreskes, historienne des sciences : « Nous mettons en œuvre aux Etats-Unis des idées politiques qui ne fonctionnent pas. Nous payons le prix fort du libre marché »
http://www.semanlink.net/doc/2024/03/naomi_oreskes_historienne_des_
2024-03-03T14:23:09ZRaptor Retriever LlamaPack
http://www.semanlink.net/doc/2024/03/raptor_retriever_llamapack
2024-03-03T22:17:10ZIntro to DSPy: Goodbye Prompting, Hello Programming! | by Leonie Monigatti | Feb, 2024 | Towards Data Science
http://www.semanlink.net/doc/2024/03/intro_to_dspy_goodbye_promptin
2024-03-01T02:17:40ZFine-tuning transformers: : Vocabulary transfer: Artificial Intelligence: Vol 317, No C
http://www.semanlink.net/doc/2024/02/fine_tuning_transformers_voc
2024-02-29T14:16:41ZOmar Khattab sur X : "ColBERT in 81 languages by generalizing from English training! ..."
http://www.semanlink.net/doc/2024/02/omar_khattab_sur_x_imo_one_o
2024-02-28T21:54:34Zraphaelsty/neural-tree: Tree-based indexes for neural-search
http://www.semanlink.net/doc/2024/02/raphaelsty_neural_tree_tree_ba
> Are tree-based indexes the counterpart of standard ANN algorithms for token-level embeddings IR models?
2024-02-28T21:47:27ZCes virus qui ont colonisé notre génome : amis ou ennemis ?
http://www.semanlink.net/doc/2024/02/ces_virus_qui_ont_colonise_notr
2024-02-27T00:27:22ZRavi Theja sur X : "𝐀𝐜𝐭𝐢𝐯𝐞𝐑𝐀𝐆: 𝐑𝐞𝐯𝐞𝐚𝐥𝐢𝐧𝐠 𝐭𝐡𝐞 𝐓𝐫𝐞𝐚𝐬𝐮𝐫𝐞𝐬 𝐨𝐟 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 𝐯𝐢𝐚 𝐀𝐜𝐭𝐢𝐯𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠..."
http://www.semanlink.net/doc/2024/02/ravi_theja_sur_x_%F0%9F%9A%80_%F0%9D%90%80%F0%9D%90%9C%F0%9D%90%AD%F0%9D%90%A2
2024-02-25T10:12:14Z[2307.15936] A Theory for Emergence of Complex Skills in Language Models
http://www.semanlink.net/doc/2024/02/2307_15936_a_theory_for_emerg
[New Theory Suggests Chatbots Can Understand Text | Quanta Magazine](doc:2024/02/new_theory_suggests_chatbots_ca)
2024-02-24T00:11:29Z« La transition écologique est mal partie »
http://www.semanlink.net/doc/2024/02/%C2%AB_la_transition_ecologique_est_
L'humanité mérite de disparaitre, de toute façon.
Ce qui m'ennuie, ce sont les hérissons. Ils sont mignons les hérissons.
2024-02-23T18:06:52ZJerry Liu sur X : "a big step towards better RAG... is to just have a really nice PDF parser. It’s so important because a good parser unlocks way more interesting indexing/retrieval strategies…"
http://www.semanlink.net/doc/2024/02/jerry_liu_sur_x_i%E2%80%99ve_talked_
2024-02-23T18:12:06Z« Sur l’environnement, le divorce entre la Macronie et la communauté scientifique est désormais consommé »
http://www.semanlink.net/doc/2024/02/%C2%AB_sur_l%E2%80%99environnement_le_divor
> Que le pouvoir en place bloque la publication d’un rapport d’expertise n’est jamais anodin
(cf. [Risques liés aux « nouveaux OGM » : Avis de l'Anses resté confidentiel](doc:2024/03/risques_lies_aux_«_nouveaux_ogm))
« Make our planet dead after all »
2024-02-18T10:18:31Z