Semanlink - [2011.06993] FLERT: Document-Level Features for Named Entity Recognition

[2011.06993] FLERT: Document-Level Features for Named Entity Recognition

Tags:

About This Document

sl:arxiv_author :
- Stefan Schweter
- Alan Akbik
sl:arxiv_firstAuthor : Stefan Schweter
sl:arxiv_num : 2011.06993
sl:arxiv_published : 2020-11-13T16:13:59Z
sl:arxiv_summary : Current state-of-the-art approaches for named entity recognition (NER) using BERT-style transformers typically use one of two different approaches: (1) The first fine-tunes the transformer itself on the NER task and adds only a simple linear layer for word-level predictions. (2) The second uses the transformer only to provide features to a standard LSTM-CRF sequence labeling architecture and thus performs no fine-tuning. In this paper, we perform a comparative analysis of both approaches in a variety of settings currently considered in the literature. In particular, we evaluate how well they work when document-level features are leveraged. Our evaluation on the classic CoNLL benchmark datasets for 4 languages shows that document-level features significantly improve NER quality and that fine-tuning generally outperforms the feature-based approaches. We present recommendations for parameters as well as several new state-of-the-art numbers. Our approach is integrated into the Flair framework to facilitate reproduction of our experiments.@en
sl:arxiv_title : FLERT: Document-Level Features for Named Entity Recognition@en
sl:arxiv_updated : 2020-11-13T16:13:59Z
sl:bookmarkOf : https://arxiv.org/abs/2011.06993
sl:creationDate : 2020-12-01
sl:creationTime : 2020-12-01T09:25:14Z

File info

Bookmark of: https://arxiv.org/abs/2011.06993

Documents with similar tags (experimental)

[2307.02486] LongNet: Scaling Transformers to 1,000,000,000 Tokens

Tags:

2023-07-06 About

[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Tags:

2023-07-01 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2302.01398] The unreasonable effectiveness of few-shot learning for machine translation

Tags:

2023-02-07 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Tags:

2022-09-16 About

[1807.00745] Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Tags:

2022-07-18 About

[2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers

Tags:

2022-01-29 About

[2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Tags:

2021-10-13 About

[2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond

Tags:

2021-07-09 About

[2103.11811] MasakhaNER: Named Entity Recognition for African Languages

Tags:

2021-07-06 About

[2001.08053] Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization

Tags:

2020-10-01 About

[1909.01259] Neural Attentive Bag-of-Entities Model for Text Classification

Tags:

A model that performs **text classification using entities in a knowledge base**.

> Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine **simple high-recall entity detection based on a dictionary** (word->list of entities), to detect entities in a document, with a novel neural **attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities**.

2 steps:

1. Entity detection
2. Classification using the detected entities (+text) as inputs

Regarding entity linking, a local model which uses cosine
similarity between the embedding of the target
entity and the word-based representation of
the document to capture the relevance of an entity
given a document.

Embeddings from the KB: computed using [#Wikipedia2Vec](tag:wikipedia2vec) (similar words and entities
close to one another in a unified vector space)

Model using attention, with 2 features :

- cosine similarity between the
embedding of the entity and the word based
representation of the document
- the probability that the entity
name refers to the entity in KB.

Somewhat [related](doc:2020/01/investigating_entity_knowledge_)

### Conclusion:

>a neural
network model that performs text classification using
entities in Wikipedia. We combined simple
dictionary-based entity detection with a neural attention
mechanism to enable the model to focus
on a small number of unambiguous and relevant
entities in a document.

2020-09-02 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[2002.11402] Detecting Potential Topics In News Using BERT, CRF and Wikipedia

Tags:

2020-02-27 About

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

2020-02-17 About

[1911.05507] Compressive Transformers for Long-Range Sequence Modelling

Tags:

2020-02-11 About

Named Entity Recognition with Pytorch Transformers – Pierre-Yves Vandenbussche

Tags:

2019-12-11 About - Local Copy

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1909.03186] On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Tags:

2019-09-11 About

[1908.08983] A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Tags:

2019-08-28 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About

[1812.09449] A Survey on Deep Learning for Named Entity Recognition

Tags:

2019-04-24 About

[1806.04411] Named Entity Recognition with Extremely Limited Data

Tags: