Semanlink - Question Answering

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About

Which flavor of BERT should you use for your QA task? | by Olesya Bondarenko | Towards Data Science

Tags:

2020-10-04 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[2006.09462] Selective Question Answering under Domain Shift

Tags:

2020-06-30 About

Differentiable Reasoning over Text – Machine Learning Blog | ML@CMU | Carnegie Mellon University

Tags:

2020-05-16 About

Hugging Face sur Twitter : DistilBERT-cased for Question Answering w/ just 3 lines of javascript

Tags:

2020-02-14 About

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Tags:

> It has recently been observed that neural language
models trained on unstructured text can
implicitly store and retrieve knowledge using
natural language queries.

indeed, cf. Facebook's paper [Language Models as Knowledge Bases?](/doc/2019/09/_1909_01066_language_models_as)

> In this short paper,
we measure the practical utility of this
approach by fine-tuning pre-trained models to
answer questions without access to any external
context or knowledge.

> we show that a large language
model pre-trained on unstructured text can
attain competitive results on open-domain question
answering benchmarks without any access
to external knowledge

BUT:

>1. state-of-the-art results only with the largest model
which had 11 billion parameters.
>1. “open-book” models
typically provide some indication of what information
they accessed when answering a question
that provides a useful form of interpretability.
In contrast, our model distributes knowledge
in its parameters in an inexplicable way, which
precludes this form of interpretability.
>1. **the maximum-likelihood objective provides no guarantees as to whether
a model will learn a fact or not.**

So, what's the point? To be compared with this [IBM's paper](/doc/2019/09/_1909_04120_span_selection_pre): "a new pre-training task inspired by reading comprehension and an effort to avoid encoding general knowledge in the transformer network itself"

2020-02-11 About

Adam Roberts sur Twitter : "New preprint: How Much Knowledge Can You Pack into the Parameters of a Language Model?..."

Tags:

2020-02-11 About

[1912.08904] Macaw: An Extensible Conversational Information Seeking Platform

Tags:

2020-01-01 About

[1910.09760] Question Answering over Knowledge Graphs via Structural Query Patterns

Tags:

2019-11-06 About

Improving long-form question answering by compressing search results

Tags:

2019-10-24 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

A Neural QA Model for DBpedia (GSoC 2019)

Tags:

2019-03-26 About

The Stanford Question Answering Dataset

Tags:

2018-11-05 About