[1909.04120] Span Selection Pre-training for Question Answering (2019)
> a **new pre-training task inspired by reading comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself** Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network. "Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is relevant as determined by a BM25 search, and answer-bearing (containing the answer term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage using language understanding. > **We hope to progress to a model of general purpose language modeling that uses an indexed long term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**
About This Document
File info