About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Timothy Niven
- sl:arxiv_num : 1907.07355
- sl:arxiv_published : 2019-07-17T06:26:20Z
- sl:arxiv_summary : We are surprised to find that BERT's peak performance of 77% on the Argument
Reasoning Comprehension Task reaches just three points below the average
untrained human baseline. However, we show that this result is entirely
accounted for by exploitation of spurious statistical cues in the dataset. We
analyze the nature of these cues and demonstrate that a range of models all
exploit them. This analysis informs the construction of an adversarial dataset
on which all models achieve random accuracy. Our adversarial dataset provides a
more robust assessment of argument comprehension and should be adopted as the
standard in future work.@en
- sl:arxiv_title : Probing Neural Network Comprehension of Natural Language Arguments@en
- sl:arxiv_updated : 2019-09-16T04:07:54Z
- sl:bookmarkOf : https://arxiv.org/abs/1907.07355
- sl:creationDate : 2019-07-24
- sl:creationTime : 2019-07-24T01:34:54Z
- sl:relatedDoc : http://www.semanlink.net/doc/2019/07/bert_s_success_in_some_benchmar
Documents with similar tags (experimental)