Semanlink - [1906.02715] Visualizing and Measuring the Geometry of BERT

Tags:

Attached documents

Language, trees, and geometry in neural networks

Tags:

Chris Manning

2019-06-09 About

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Andy Coenen
sl:arxiv_num : 1906.02715
sl:arxiv_published : 2019-06-06T17:33:22Z
sl:arxiv_summary : Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.@en
sl:arxiv_title : Visualizing and Measuring the Geometry of BERT@en
sl:arxiv_updated : 2019-10-28T17:53:14Z
sl:bookmarkOf : https://arxiv.org/abs/1906.02715
sl:creationDate : 2019-06-07
sl:creationTime : 2019-06-07T23:33:36Z

File info

Bookmark of: https://arxiv.org/abs/1906.02715

Linked From

Language, trees, and geometry in neural networks

Tags:

Chris Manning

2019-06-09 About

Documents with similar tags (experimental)

[1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

2021-04-11 About

[2002.12327] A Primer in BERTology: What we know about how BERT works

Tags:

2020-02-28 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

[1907.07355] Probing Neural Network Comprehension of Natural Language Arguments

Tags:

2019-07-24 About

[1906.04341] What Does BERT Look At? An Analysis of BERT's Attention

Tags:

2019-06-21 About