Semanlink - [2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers

[2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Arman Cohan
sl:arxiv_num : 2004.07180
sl:arxiv_published : 2020-04-15T16:05:51Z
sl:arxiv_summary : Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark.@en
sl:arxiv_title : SPECTER: Document-level Representation Learning using Citation-informed Transformers@en
sl:arxiv_updated : 2020-05-20T17:39:52Z
sl:bookmarkOf : https://arxiv.org/abs/2004.07180
sl:creationDate : 2022-01-29
sl:creationTime : 2022-01-29T15:18:20Z

File info

Bookmark of: https://arxiv.org/abs/2004.07180

Linked From

[1805.09906] Diffusion Maps for Textual Network Embedding

Tags:

2022-08-19 About

Documents with similar tags (experimental)

[2307.02486] LongNet: Scaling Transformers to 1,000,000,000 Tokens

Tags:

2023-07-06 About

[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Tags:

2023-07-01 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2302.01398] The unreasonable effectiveness of few-shot learning for machine translation

Tags:

2023-02-07 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Tags:

2022-09-16 About

[1805.09906] Diffusion Maps for Textual Network Embedding

Tags:

2022-08-19 About

[2110.10778] Contrastive Document Representation Learning with Graph Attention Networks

Tags:

2022-03-10 About

[2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Tags:

2021-10-13 About

[2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond

Tags:

2021-07-09 About

[2011.06993] FLERT: Document-Level Features for Named Entity Recognition

Tags:

2020-12-01 About

[1909.01259] Neural Attentive Bag-of-Entities Model for Text Classification

Tags:

A model that performs **text classification using entities in a knowledge base**.

> Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine **simple high-recall entity detection based on a dictionary** (word->list of entities), to detect entities in a document, with a novel neural **attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities**.

2 steps:

1. Entity detection
2. Classification using the detected entities (+text) as inputs

Regarding entity linking, a local model which uses cosine
similarity between the embedding of the target
entity and the word-based representation of
the document to capture the relevance of an entity
given a document.

Embeddings from the KB: computed using [#Wikipedia2Vec](tag:wikipedia2vec) (similar words and entities
close to one another in a unified vector space)

Model using attention, with 2 features :

- cosine similarity between the
embedding of the entity and the word based
representation of the document
- the probability that the entity
name refers to the entity in KB.

Somewhat [related](doc:2020/01/investigating_entity_knowledge_)

### Conclusion:

>a neural
network model that performs text classification using
entities in Wikipedia. We combined simple
dictionary-based entity detection with a neural attention
mechanism to enable the model to focus
on a small number of unambiguous and relevant
entities in a document.

2020-09-02 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

2020-02-17 About

[1911.05507] Compressive Transformers for Long-Range Sequence Modelling

Tags:

2020-02-11 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1909.03186] On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Tags:

2019-09-11 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About

[1903.05823] Deep Patent Landscaping Model Using Transformer and Graph Embedding

Tags:

2019-03-18 About

[1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Tags:

2019-01-11 About

[1706.03762] Attention Is All You Need

Tags:

2018-10-12 About

[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Tags:

2018-10-12 About