About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Andy Coenen
- sl:arxiv_num : 1906.02715
- sl:arxiv_published : 2019-06-06T17:33:22Z
- sl:arxiv_summary : Transformer architectures show significant promise for natural language
processing. Given that a single pretrained model can be fine-tuned to perform
well on many different tasks, these networks appear to extract generally useful
linguistic features. A natural question is how such networks represent this
information internally. This paper describes qualitative and quantitative
investigations of one particularly effective model, BERT. At a high level,
linguistic features seem to be represented in separate semantic and syntactic
subspaces. We find evidence of a fine-grained geometric representation of word
senses. We also present empirical descriptions of syntactic representations in
both attention matrices and individual word embeddings, as well as a
mathematical argument to explain the geometry of these representations.@en
- sl:arxiv_title : Visualizing and Measuring the Geometry of BERT@en
- sl:arxiv_updated : 2019-10-28T17:53:14Z
- sl:bookmarkOf : https://arxiv.org/abs/1906.02715
- sl:creationDate : 2019-06-07
- sl:creationTime : 2019-06-07T23:33:36Z
Documents with similar tags (experimental)