Semanlink - [2004.05150] Longformer: The Long-Document Transformer

[2004.05150] Longformer: The Long-Document Transformer

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Iz Beltagy
sl:arxiv_num : 2004.05150
sl:arxiv_published : 2020-04-10T17:54:09Z
sl:arxiv_summary : Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.@en
sl:arxiv_title : Longformer: The Long-Document Transformer@en
sl:arxiv_updated : 2020-04-10T17:54:09Z
sl:bookmarkOf : https://arxiv.org/abs/2004.05150
sl:creationDate : 2020-04-13
sl:creationTime : 2020-04-13T11:06:40Z

File info

Bookmark of: https://arxiv.org/abs/2004.05150

Documents with similar tags (experimental)

[2401.18059] RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Tags:

2024-02-03 About

[2307.13269] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Tags:

2023-08-08 About

[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Tags:

2023-07-01 About

[2303.17651] Self-Refine: Iterative Refinement with Self-Feedback

Tags:

2023-04-03 About

[2303.14177] Scaling Expert Language Models with Unsupervised Domain Discovery

Tags:

2023-03-27 About

[2212.09741] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Tags:

2023-02-17 About

[2008.09093] PARADE: Passage Representation Aggregation for Document Reranking

Tags:

2022-09-21 About

[2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners

Tags:

2022-09-07 About

[2106.10199] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Tags:

2022-09-01 About

[2208.09982] GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization

Tags:

2022-08-24 About

[1902.06006] Contextual Word Representations: A Contextual Introduction

Tags:

2022-07-08 About

[2205.08012] CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

Tags:

2022-07-07 About

[2110.10778] Contrastive Document Representation Learning with Graph Attention Networks

Tags:

2022-03-10 About

[2107.12708] QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Tags:

2021-08-06 About

[2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond

Tags:

2021-07-09 About

[2106.04612] Neural Extractive Search

Tags:

how to extend a
search paradigm we call “**extractive search**” with
neural similarity techniques.

> some information needs require extracting
and aggregating sub-sentence information
(words, phrases, or entities) from multiple documents
(e.g. a list of all the risk factors for a specific
disease and their number of mentions, or a comprehensive
table of startups and CEOs).

> extractive search combines
document selection with information extraction. **The query is extended with capture slots**:
these are **search terms that act as variables, whose
values should be extracted**.
> The user
is then presented with the matched documents, each
annotated with the corresponding captured spans,
as well as aggregate information over the captured
spans

Conclusion :

> We presented a system for neural extractive search.
While we found our system to be useful for scientific
search, it also has clear limitations and areas
for improvement, both in terms of accuracy (only
72.2% of the returned results are relevant, both the
alignment and similarity models generalize well to
some relations but not to others), and in terms of
scale

[Video of demo](https://www.youtube.com/watch?v=TtqWi2GgB5A&t=1832s)

2021-06-23 About

[1807.04905] Ultra-Fine Entity Typing

Tags:

2021-06-22 About

[1909.04164] Knowledge Enhanced Contextual Word Representations

Tags:

2020-05-13 About

[1906.07241] Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Tags:

2020-05-11 About

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

2020-02-17 About

[1911.05507] Compressive Transformers for Long-Range Sequence Modelling

Tags:

2020-02-11 About