]> Dans l’ombre de la Russie, la Chine pousse ses pions dans les mines du Sahel 2025-01-04T09:12:13Z 2025-01-04 2025-01-17T15:55:46Z A Wuhan, le taxi n’a plus de chauffeur 2025-01-17 2025-01-11 2025-01-11T14:40:36Z LlamaIndex 🦙 sur X : "a new multilingual, open-source visual embedding model and training set on Huggingface..." The field of Artificial Intelligence (AI) continues to drive transformative innovations, with significant progress in conversational interfaces, autonomous vehicles, and intelligent content creation. Since the launch of ChatGPT in late 2022, the rise of Generative AI has marked a pivotal era, with the term Large Language Models (LLMs) becoming a ubiquitous part of daily life. LLMs have demonstrated exceptional capabilities in tasks such as text summarization, code generation, and creative writing. However, these models are inherently limited by their token-level processing, which restricts their ability to perform abstract reasoning, conceptual understanding, and efficient generation of long-form content. To address these limitations, Meta has introduced Large Concept Models (LCMs), representing a significant shift from traditional token-based frameworks. LCMs use concepts as foundational units of understanding, enabling more sophisticated semantic reasoning and context-aware decision-making. Given the limited academic research on this emerging technology, our study aims to bridge the knowledge gap by collecting, analyzing, and synthesizing existing grey literature to provide a comprehensive understanding of LCMs. Specifically, we (i) identify and describe the features that distinguish LCMs from LLMs, (ii) explore potential applications of LCMs across multiple domains, and (iii) propose future research directions and practical strategies to advance LCM development and adoption. The Future of AI: Exploring the Potential of Large Concept Models Diksha Goel 2025-01-08T18:18:37Z Hussain Ahmad 2501.05487 [2501.05487] The Future of AI: Exploring the Potential of Large Concept Models Hussain Ahmad 2025-01-15T01:52:18Z 2025-01-15 2025-01-08T18:18:37Z A little pooling goes a long way for multi-vector representations – Answer.AI 2025-01-24 2025-01-24T17:01:53Z > Intuition: for documents focusing on a low number of topics, a lot of the tokens are likely to carry somewhat redundant semantic information, meaning keeping all of them is likely not useful. Improving Retrieval Augmented Generation accuracy with GraphRAG | AWS Machine Learning Blog 2025-01-09T18:43:56Z 2025-01-09 > In this post, we explore why GraphRAG is more comprehensive and explainable than vector RAG alone, and how you can use this approach using AWS services and Lettria 2025-01-03 GitHub - unclecode/crawl4ai: 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper 2025-01-03T17:20:49Z 2025-01-17 > human-like memory structures to overcome the limits of Transformers, with one "SURPRISING" feature. > > - Short-term memory (real-time processing) > - Long-term memory (retaining key past information) > - Persistent memory (task-specific baked-in knowledge) > Titans can learn and adapt during inference (test time), unlike Transformers, which rely on pre-training. 2025-01-17T13:26:55Z MatthewBerman sur X : "Titans: Learning to Memorize at Test Time"... (a nice schema) 2025-01-06 GitHub - bRAGAI/bRAG-langchain: Everything you need to know to build your own RAG application 2025-01-06T11:03:34Z Charles Borderie | LinkedIn 2025-01-08T13:49:59Z 2025-01-08 L'instruction 2025-01-28 2025-01-28T00:07:32Z > adaptation cinématographique de la pièce de théâtre "L'instruction" de Peter Weiss, reprend des dépositions d'accusés et de témoins du procès de Francfort, où furent jugés entre 1963 et 1965 les responsables des crimes commis à Auschwitz. > En 1963, dans une Allemagne qui semble avoir oublié la guerre et le national-socialisme, l'ouverture du procès de Francfort met fin à l'amnésie collective : d'anciens soldats SS sont jugés pour leur implication dans le fonctionnement du camp d'Auschwitz, près de vingt ans après la fin des procès de Nuremberg. Témoins et accusés défilent à la barre durant cent quatre-vingts jours, les uns décrivant le pire dont l'humanité est capable, les autres se réfugiant derrière des procédures et des ordres, preuve d'un déni persistant. (voir à 1H25) « Le populisme est aussi une crise de l’éducation » 2025-01-05 2025-01-05T09:29:57Z 2025-01-03T13:50:10Z 2025-01-03 ChatGPT: looking for a NER solution, where the entities to extract are provided as a list of phrases How to Implement Graph RAG Using Knowledge Graphs and Vector Databases | by Steve Hedden | Towards Data Science 2025-01-30T17:37:14Z 2025-01-30 2025-01-15T10:21:45Z 2025-01-15 Comment des applications grand public facilitent le pistage des utilisateurs à leur insu LangGraph AI Agents with Knowledge Graph | by Kshitij Kutumbe | Globant | Jan, 2025 | Medium 2025-01-31T09:07:05Z 2025-01-31 2025-01-13T23:27:48Z IA versus ordinateur quantique : un combat très physique 2025-01-13 > STORM, a writing system focusing on the pre-writing stage to generate long, grounded, Wikipedia-like article for a given topic from scratch. 2025-01-02 2025-01-02T15:55:51Z Stanford STORM Research Project We need to talk about human genome editing 2025-01-22 2025-01-22T23:32:07Z > ... And then there’s the risk that these technologies will widen inequality and social divisions 2025-01-25 comment des étudiants peu coopératifs dans un TP sont à l'origine d'une découverte médicale importante Un remède de chameau - La révolution des nanocorps - ARTE 2025-01-25T13:46:14Z 2025-01-10 2025-01-10T00:57:18Z > Nous allons nous débarrasser des fact-checkers [Mark Zuckerberg veut plus d’« énergie masculine » et moins de politique de diversité](https://www.lemonde.fr/economie/article/2025/01/11/mark-zuckerberg-veut-plus-d-energie-masculine-et-moins-de-politique-de-diversite_6493340_3234.html) Après Musk et Bezos… Zuckerberg : la tech en ordre de marche derrière Trump 2025-01-06 2025-01-06T12:12:45Z En Corée du Sud, pays ultraconnecté à Internet, les chaînes YouTube exacerbent la crise politique jack morris sur X : "New state-of-the-art small text embedding model... (cde-small-v2)" 2025-01-15T01:41:26Z 2025-01-15 (cf. [cde-small-v1](doc:2024/10/philipp_schmid_sur_x_can_we_) : creating "context-aware" embeddings using neighboring document information) Paul Couvert sur X : "Microsoft has released its new open source model Phi-4... you can run it locally on your laptop..." 2025-01-11T11:22:37Z 2025-01-11 GitHub - thesephist/libsearch: Simple, index-free full-text search for JavaScript 2025-01-04 2025-01-04T12:17:05Z 2025-01-28 La start-up chinoise DeepSeek crée une onde de choc sur le secteur de l’IA 2025-01-28T07:42:47Z [Nvidia perd 600 milliards de dollars en valorisation boursière et entraîne le Nasdaq dans sa chute](https://www.lemonde.fr/economie/article/2025/01/28/nvidia-perd-600-milliards-et-entraine-dans-sa-chute-le-nasdaq_6519363_3234.html) Xenova sur X : "Vision Transformer Explorer: a web app to interactively explore the self-attention maps produced by ViTs..." 2025-01-04T11:28:55Z 2025-01-04 nomic-ai/modernbert-embed-base · Hugging Face 2025-01-02T15:59:42Z 2025-01-02 > embedding model trained from ModernBERT-base, bringing the new advances of ModernBERT to embeddings! Akshay 🚀 sur X : "Microsoft has released its own document parser for LLM use! . . MarkItDown" 2025-01-02T16:01:06Z 2025-01-02 2025-01-28T13:35:27Z 2025-01-28 Daniel San sur X : "Deepseek running locally and privately for autocompletion in VSCode! ..." 2025-01-13 2025-01-13T19:00:31Z MIT's "Mathematics for Computer Science" (2018) > « il n’y a pas assez d’argent sur la Terre pour dépolluer l’environnement des PFAS au rythme où nous les produisons actuellement ». (Ali Ling, university of St. Thomas, Minnesota) > Les petits esprits sont à l’affût des petits profits, ils produisent de grandes catastrophes. « La contamination du monde par les PFAS forme une catastrophe parfaite, dystopique dans toutes ses dimensions » 2025-01-26 2025-01-26T23:35:33Z > Au bout de quinze mois de siège, la ville tomba en été 133 av. J.-C., vaincue par la famine. Ses habitants préférèrent se suicider plutôt que de se rendre. Ils incendièrent la ville pour qu'elle ne tombât pas aux mains de l'ennemi. Numance 2025-01-11 2025-01-11T10:58:41Z 2025-01-17 L'objectif, c'est l'amélioration de solutions de type RAG, et le moyen, c'est la recherche dans des graphes de connaissance qui modélisent le domaine automobile. 2025-01-17T14:00:05Z Offre de stage en NLP/Information Retrieval, chez Renault 2025-01-26 Les lettres de Kanesh > A la fin du XIXᵉ siècle, des paysans anatoliens découvrent, dans le village de Kültepe, des tablettes d’argile imprimées d’écritures cunéiformes vieilles de 4 000 ans. Déchiffrés et analysés par des assyriologues, ces textes offrent un aperçu fascinant de l’organisation des sociétés cosmopolites de l’âge du bronze au Proche-Orient. 2025-01-26T23:58:41Z 2025-01-12T03:32:12Z Aidan Hogan 2501.06699 Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions 2025-01-17T15:07:51Z Denny Vrandečić 2025-01-17 2025-01-12T03:32:12Z Aidan Hogan Much has been discussed about how Large Language Models, Knowledge Graphs and Search Engines can be combined in a synergistic manner. A dimension largely absent from current academic discourse is the user perspective. In particular, there remain many open questions regarding how best to address the diverse information needs of users, incorporating varying facets and levels of difficulty. This paper introduces a taxonomy of user information needs, which guides us to study the pros, cons and possible synergies of Large Language Models, Knowledge Graphs and Search Engines. From this study, we derive a roadmap for future research. [2501.06699] Large Language Models, Knowledge Graphs and Search Engines: A Crossroads for Answering Users' Questions Xin Luna Dong (pas grand chose en fait. Pourtant [KGC23 Keynote: The Future of Knowledge Graphs in a World of LLMs — Denny Vrandečić, Wikimedia - YouTube](doc:2024/03/kgc23_keynote_the_future_of_kn) était très bien) Gerhard Weikum > Training based on unsupervised distillation > The current dominant way of training retrieval models is via the use of a contrastive loss, with little-to-no knowledge distillation > (Stella's) training work within the embedding space, seeking to minimize the geometric distances... between the teachers' vectors and the student model (Stella)'s outputs. > > Stella models (and Jasper models) generalize amazingly well because of this. 2025-01-13T18:42:18Z 2025-01-13 Benjamin Clavié sur X : "Stella Embeddings: What's the big deal?..." 2025-01-14 2025-01-14T08:17:32Z PFAS : le coût vertigineux de la dépollution de l’Europe 2025-01-31 2025-01-31T09:09:14Z Mixture-of-Experts (MoE) LLMs - by Cameron R. Wolfe, Ph.D. 2025-01-27T08:04:44Z 2025-01-27 DeepSeek 2025-01-27T00:04:26Z 2025-01-27 > A rebours des discours gouvernementaux, non seulement les normes protègent l’environnement et la santé, mais elles sont l’unique argument contre un accord avec le Mercosur « La détresse des agriculteurs est d’abord le fruit de choix politiques qui ne sont jamais assumés » 2025-01-02 2025-01-02T01:24:05Z Audrey Tang Excaver les sols, changer l’eau des lacs, ne pas jouer dehors : la pollution aux PFAS plonge la Flandre dans un désastre dystopique 2025-01-15 2025-01-15T08:31:18Z 2025-01-04 2025-01-04T11:25:05Z Omar Khattab sur X : "When building ColBERT, I assumed it will pave the way for hypernetwork-based, pruning-capable retrieval indexes. Let me explain..." > The big insight in ColBERT is that we can encode each document upfront *not* into a vector, but into a rich scoring function, f: query -> float, which simultaneously supports pruning, so you can skip most computation. > > In v1/v2, the choice of function was "a matrix + MaxSim". > But in the future, the function could also be a small DNN constructed out of each document! 2025-01-09T18:38:42Z 2025-01-09 ChatGPT - Graph Knowledge Representation Models 2025-01-05T10:15:59Z Foreign Devils on the Silk Road: The Search for the Lost Cities and Treasures of Chinese Central Asia, 1980 Archéologues: - [Sven Hedin](doc:2024/11/sven_hedin) - <https://en.wikipedia.org/wiki/Aurel_Stein> - <https://fr.wikipedia.org/wiki/Paul_Pelliot> - ... Lieux: - <https://en.wikipedia.org/wiki/Turpan> - <https://en.wikipedia.org/wiki/Bezeklik_Caves> - [Caves of the Thousand Buddhas](doc:2025/01/dunhuang_manuscripts_mogao_cav) - ... [Zhang Qian](tag:zhang_qian) cf. [Bouddhas et rôdeurs sur la route de la soie](doc:2025/01/bouddhas_et_rodeurs_sur_la_rout) Mostly Chinese writing, but also Tibetan, undeciphered Nam language, Khotanese, Sanskrit, Sogdian, Old Uyghur, Hebrew and Old Turkic. Dunhuang manuscripts (Caves of the Thousand Buddhas) 2025-01-13T01:37:34Z 2025-01-13 2025-01-05 Bouddhas et rôdeurs sur la route de la soie