Related Tags:
2 Documents (Long List
  • [1909.04120] Span Selection Pre-training for Question Answering (2019) (About)
    > a **new pre-training task inspired by reading comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself** Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network. "Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is relevant as determined by a BM25 search, and answer-bearing (containing the answer term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage using language understanding. > **We hope to progress to a model of general purpose language modeling that uses an indexed long term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**
  • Word Mover's Embedding: From Word2Vec to Document Embedding (2018) (About)
    unsupervised embeddings of sentences of variable length from pre-trained word embeddings (better on short length text). (Builds on the word mover's distance, but using ideas borrowed from kernel methods approximation, gets a representation of sentences, instead of just a distance between them)