]> 2021-11-27T15:39:23Z 2021-11-27 Implementing an Interface in Python – Real Python ParlAI: A Dialog Research Software Platform 2017-05-18T08:54:47Z 2021-11-21T18:35:11Z 2021-11-21 Dhruv Batra Alexander H. Miller Jason Weston Devi Parikh 2018-03-08T19:58:17Z 1705.06476 Antoine Bordes Jiasen Lu Alexander H. Miller Will Feng We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl.ai. Its goal is to provide a unified framework for sharing, training and testing of dialog models, integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning; and a repository of machine learning models for comparing with others' models, and improving upon existing architectures. Over 20 tasks are supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Several models are integrated, including neural models such as memory networks, seq2seq and attentive LSTMs. Adam Fisch [1705.06476] ParlAI: A Dialog Research Software Platform > tutorial on using the sentence-transformers library to fine-tune Sentence-BERT for question matching 2021-11-21T12:38:13Z 2021-11-21 How to Fine-Tune Sentence-BERT for Question Answering | Capital One Sea - Internet-Augmented Dialogue Generation an approach that learns to generate an internet search query based on the context, and then conditions on the search results to finally generate a response, a method that can employ up-to-the-minute relevant information. 2021-11-25T00:48:52Z 2021-11-25 Multilingual Sentence Transformers | Pinecone How to make a text encoder multilingual using sentence transformers and multilingual knowledge distillation. 2021-11-04 2021-11-04T23:09:34Z **Consommation finale (au niveau des consommateurs)** de 2018 **par source primaire d'énergie** : > - combustibles fossiles : 67,4 % (charbon 1,9 %, pétrole 44,0 %, gaz naturel 21,5 %) ; > - nucléaire : 17,7 % ; > - énergies renouvelables : 14,9 % (biomasse-déchets 9,8 %, hydraulique 3,0 %, éolien 1,2 %, solaire 0,5 % (essentiellement photovoltaïque), autres 0,3 %). **Consommation finale** > - 43,1 % de produits pétroliers, > - 24,3 % d'électricité, > - 19,3 % de gaz naturel, > - 10,0 % d'énergies renouvelables thermiques et déchets, > - 2,4 % de chaleur > - 0,8 % de charbon **Offre d'énergie primaire** 241,4 M tep, **par agent énergétique** -- ??? : LE TOTAL NE FAIT PAS 100 !!! > - électricité : 45 % > - pétrole : 29 % > - gaz naturel : 15,5 % > - bois, biomasse, déchets : 7,5 % > - charbon : 3,1 % > - Énergies renouvelables 11,5 % **forme d'énergie primaire consommée**, tous secteurs confondus, > - électricité nucléaire 40 % > - pétrole : 29 %, > - gaz : 16 %, > - énergies renouvelables et déchets : 12 % > - charbon : 3 % **Electricité par filière** : > - nucléaire : 69,9 % > - hydro : 11 % > - thermique : 9 % > - éoliennes : 6,1 % > - autres : 2,1 % > - biomasse/déchets : 2 % **Consommation finale par secteur** > Les 142,9 Mtep de consommation finale énergétique réelle (non corrigée) se répartit entre > - le secteur des transports : 45,2 Mtep, soit 31,6 %, > - le secteur résidentiel (consommation des ménages dans leurs logements) : 39,8 Mtep, soit 27,9 %, > - l'industrie : 31 Mtep, soit 21,7 %, > - le secteur tertiaire : 22,4 Mtep, soit 15,7 % > - le secteur agricole : 4,4 Mtep, soit 3,1 %. . 2021-11-03 Énergie en France — Wikipédia 2021-11-03T00:02:14Z 2021-11-03 2021-11-03T12:29:01Z Babel Minute Zéro - Guy-Philippe Goldstein : le chaos en temps réel 2021-11-07 2021-11-07T10:57:37Z Pre-training + Massive Multi-tasking, Benchmarking in NLP, EMNLP primer, 🤗 NLP Course, ACL 2021 recap, | Revue - Open-book QA: Retriever-Reader - Retriever Model - Reader Model - End-to-end Joint Training (REALM, [DPR](tag:dense_passage_retrieval)) - Open-book QA: Retriever-Generator ("Generative Question Answering"). Generate free text directly to answer the question rather than to extract start/end position in a retrieved passage - Closed-book QA: Generative Language Model - Related Techniques - Fast Maximum Inner Product Search (MIPS) (eg. [faiss](tag:faiss)) - Language Model Pre-training - Inverse Cloze Task - Salient Spans Masking How to Build an Open-Domain Question Answering System? 2021-11-18 2021-11-18T22:17:56Z Daniel Boies Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks Timothy J. Hazen 2021-11-19 domain adaptation for enabling QA systems to answer questions posed against documents in new specialized domains > In experiments on question answering in the **automobile manual domain** we demonstrate that **standard DNN transfer learning techniques work surprisingly well** in adapting DNN models to a new domain **using limited amounts of annotated training data** in the new domain. > **unsupervised domain adaption techniques to a base model could provide some improvement in the absence of in-domain labeled training data**, but there may be **no advantage to these methods once standard transfer learning methods are able to use even limited amounts of annotated training data** in a new domain. Timothy J. Hazen This paper explores domain adaptation for enabling question answering (QA) systems to answer questions posed against documents in new specialized domains. Current QA systems using deep neural network (DNN) technology have proven effective for answering general purpose factoid-style questions. However, current general purpose DNN models tend to be ineffective for use in new specialized domains. This paper explores the effectiveness of transfer learning techniques for this problem. In experiments on question answering in the automobile manual domain we demonstrate that standard DNN transfer learning techniques work surprisingly well in adapting DNN models to a new domain using limited amounts of annotated training data in the new domain. 1911.02655 2019-11-06T22:35:00Z Shehzaad Dhuliawala [1911.02655] Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks 2019-11-06T22:35:00Z 2021-11-19T00:31:23Z > how do we extract relevant information from huge volume of highly technical and domain specific texts? 2021-11-30T16:36:10Z 2021-11-30 Master thesis : Term extraction from domain specific texts (2019) Blenderbot2 2021-11-19 2021-11-19T18:24:56Z > - A chatbot with its own **long-term memory** and **the ability to access the internet**. > The Two-Tower model pairs similar types of objects, such as user profiles, search queries, web documents, answer passages, or images, in the same vector space, so that related items are close to each other. **The Two-Tower model consists of two encoder towers: the query tower and the candidate tower**. These towers embed independent items into a shared embedding space, which lets Matching Engine retrieve similarly matched items. > > To train a Two-Tower model, Google uses **pairs of relevant items**. Each pair consists of a query document and a candidate document. Documents contain arbitrary customer-defined features including text, numeric, and categorical features. After training, the Two-Tower built-in algorithm exports two TensorFlow SavedModels—a query encoder and a candidate encoder... Given a query item, Matching Engine uses the query encoder to generate a query embedding, and uses the index to find similar candidate embeddings. Matching Engine uses the candidate encoder to index all the items and serve them by using an approximate nearest neighbor solution. 2021-11-04 Train embeddings by using the Two-Tower built-in algorithm  |  Vertex AI 2021-11-04T17:23:31Z Extractive Question Answering - Hugging Face transformers doc 2021-11-18 2021-11-18T22:24:55Z [2108.13854] Contrastive Domain Adaptation for Question Answering using Limited Text Corpora > a framework for answering out-of-domain questions in QA settings with limited text corpora > combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora. Here, we train a QA system on both source data and generated data from the target domain with a contrastive adaptation loss that is incorporated in the training objective. 2021-11-19 2021-11-19T00:18:40Z Stefan Feuerriegel Bernhard Kratzwald Question generation has recently shown impressive results in customizing question answering (QA) systems to new domains. These approaches circumvent the need for manually annotated training data from the new domain and, instead, generate synthetic question-answer pairs that are used for training. However, existing methods for question generation rely on large amounts of synthetically generated datasets and costly computational resources, which render these techniques widely inaccessible when the text corpora is of limited size. This is problematic as many niche domains rely on small text corpora, which naturally restricts the amount of synthetic data that can be generated. In this paper, we propose a novel framework for domain adaptation called contrastive domain adaptation for QA (CAQA). Specifically, CAQA combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora. Here, we train a QA system on both source data and generated data from the target domain with a contrastive adaptation loss that is incorporated in the training objective. By combining techniques from question generation and domain-invariant learning, our model achieved considerable improvements compared to state-of-the-art baselines. Zhenrui Yue Zhenrui Yue 2021-08-31T14:05:55Z Contrastive Domain Adaptation for Question Answering using Limited Text Corpora 2021-08-31T14:05:55Z 2108.13854 2021-11-24T21:03:44Z Blog post about [[2104.06979] TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning](doc:2021/09/2104_06979_tsdae_using_trans) > Fine-tuning with TSDAE simply cannot compete in terms of performance against supervised methods. However, **the point and value of TSDAE is that it allows us to fine-tune models for use-cases where we have no data**. Specific domains with unique terminology or low resource languages. Unsupervised Training for Sentence Transformers | Pinecone 2021-11-24 2021-11-12T12:03:08Z 2021-11-12 Écrire l'histoire des Hausas du Niger > Ce travail (mémoire Master 1 Histoire de l'Afrique) s'interroge sur les moyens dont on dispose pour écrire l'histoire des Hausas du Niger. Cette région étant restée en marge des grands états d'Afrique de l'ouest, le problème des sources, classique en Afrique subsaharienne, y est particulièrement aigu, d'autant que le centre de gravité du pays hausa est situé au Nigeria. Le travail commence par un état des lieux de cette question des sources. Il résume l'histoire des différentes entités hausas de l'espace nigérien. Enfin, il revient sur quelques grandes questions : l'origine des Hausas, le développement des cités-états, le commerce, l'islamisation, l’esclavage. VizKG, a visualization library for SPARQL query results over KGs. VizKG links SPARQL query results and external visualization libraries by mapping query variables to the visualization components needed, currently allowing for 24 types of visualizations. VizKG · PyPI 2021-11-02 2021-11-02T18:50:26Z 2021-11-19 2021-11-19T18:30:56Z Tutorial: Training AI bots to chat using MTurk and Facebook’s ParlAI | by Amazon Mechanical Turk | Happenings at MTurk Dirk Weissenborn Georg Wiese Factoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such as biomedicine, because datasets are generally too small to train a DL system from scratch. For example, the BioASQ dataset for biomedical QA comprises less then 900 factoid (single answer) and list (multiple answers) QA instances. In this work, we adapt a neural QA system trained on a large open-domain dataset (SQuAD, source) to a biomedical dataset (BioASQ, target) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create. Despite this fact, our systems achieve state-of-the-art results on factoid questions and competitive results on list questions. [1706.03610] Neural Domain Adaptation for Biomedical Question Answering 2021-11-19T00:09:38Z 2021-11-19 Mariana Neves 1706.03610 2017-06-15T15:16:18Z Neural Domain Adaptation for Biomedical Question Answering 2017-06-12T13:08:21Z Datasets are generally too small to train a DL system for QA from scratch. > we adapt a neural QA system trained on a large open-domain dataset (SQuAD) to a biomedical dataset (BioASQ) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create. Georg Wiese Glyphosate-based herbicides influence DNA methylation patterns in Coturnix japonica 2021-11-12T12:09:03Z 2021-11-12 raphaelsty/nlapi 2021-11-02 2021-11-02T20:48:27Z > a novel negative sampling approach called **Mixed Negative Sampling (MNS**). In particular, different from commonly used batch or unigram sampling methods, MNS uses a mixture of batch and uniformly sampled negatives to tackle the selection bias of implicit user feedback (voir si ça a un rapport avec [Multiple Negatives Ranking Loss](doc:2021/10/next_gen_sentence_embeddings_wi)) Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations – Google Research (WWW 2020) 2021-11-04 2021-11-04T17:31:42Z Andrea Volpini sur Twitter : New blog post on #ContentHub for #Semantic #SEO. How to cluster content using Knowledge Graph Embeddings to detect: ✓ Clusters ✓ Sub-topics ✓ Author's topical authority 2021-11-16 2021-11-16T16:52:15Z Efficient Open-Domain Question Answering | Getting Started with Baselines 2021-11-25 2021-11-25T00:46:03Z including [Dense Passage Retrieval](tag:dense_passage_retrieval) > unified platform for sharing, training and evaluating dialogue models across many tasks. 2021-11-19T18:27:40Z 2021-11-19 ParlAI 2021-11-16 2021-11-16T08:21:13Z Glyphosate : l’expertise européenne a exclu de son analyse l’essentiel de la littérature scientifique