Semanlink - [2002.05867] Transformers as Soft Reasoners over Language

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Peter Clark
sl:arxiv_num : 2002.05867
sl:arxiv_published : 2020-02-14T04:23:28Z
sl:arxiv_summary : AI has long pursued the goal of having systems reason over *explicitly provided* knowledge, but building suitable representations has proved challenging. Here we explore whether transformers can similarly learn to reason (or emulate reasoning), but using rules expressed in language, thus bypassing a formal representation. We provide the first demonstration that this is possible, and characterize the extent of this capability. To do this, we use a collection of synthetic datasets that test increasing levels of reasoning complexity (number of rules, presence of negation, and depth of chaining). We find transformers appear to learn rule-based reasoning with high (99%) accuracy on these datasets, and in a way that generalizes to test data requiring substantially deeper chaining than in the training data (95%+ scores). We also demonstrate that the models transfer well to two hand-authored rulebases, and to rulebases paraphrased into more natural language. These findings are significant as it suggests a new role for transformers, namely as a limited \"soft theorem prover\" operating over explicit theories in language. This in turn suggests new possibilities for explainability, correctability, and counterfactual reasoning in question-answering. All datasets and a live demo are available at http://rule-reasoning.apps.allenai.org/@en
sl:arxiv_title : Transformers as Soft Reasoners over Language@en
sl:arxiv_updated : 2020-02-14T04:23:28Z
sl:bookmarkOf : https://arxiv.org/abs/2002.05867
sl:creationDate : 2020-02-17
sl:creationTime : 2020-02-17T09:06:44Z

File info

Bookmark of: https://arxiv.org/abs/2002.05867

Documents with similar tags (experimental)

[2307.13269] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Tags:

2023-08-08 About

[2307.02486] LongNet: Scaling Transformers to 1,000,000,000 Tokens

Tags:

2023-07-06 About

[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Tags:

2023-07-01 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2303.17651] Self-Refine: Iterative Refinement with Self-Feedback

Tags:

2023-04-03 About

[2303.14177] Scaling Expert Language Models with Unsupervised Domain Discovery

Tags:

2023-03-27 About

[2212.09741] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Tags:

2023-02-17 About

[2302.01398] The unreasonable effectiveness of few-shot learning for machine translation

Tags:

2023-02-07 About

[2203.14465] STaR: Bootstrapping Reasoning With Reasoning

Tags:

2023-02-07 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Tags:

2022-09-16 About

[2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners

Tags:

2022-09-07 About

[2106.10199] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Tags:

2022-09-01 About

[1902.06006] Contextual Word Representations: A Contextual Introduction

Tags:

2022-07-08 About

[2205.08012] CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

Tags:

2022-07-07 About

Unveiling Transformers with LEGO - YouTube

Tags:

2022-06-30 About

(((ل()(ل() 'yoav))))👾 sur Twitter : "... another step in understanding how transformer-based LMs work..."

Tags:

2022-03-30 About

[2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers

Tags:

2022-01-29 About

[2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Tags:

2021-10-13 About

[2107.12708] QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Tags:

2021-08-06 About

[2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond

Tags:

2021-07-09 About

[2106.04612] Neural Extractive Search

Tags:

how to extend a
search paradigm we call “**extractive search**” with
neural similarity techniques.

> some information needs require extracting
and aggregating sub-sentence information
(words, phrases, or entities) from multiple documents
(e.g. a list of all the risk factors for a specific
disease and their number of mentions, or a comprehensive
table of startups and CEOs).

> extractive search combines
document selection with information extraction. **The query is extended with capture slots**:
these are **search terms that act as variables, whose
values should be extracted**.
> The user
is then presented with the matched documents, each
annotated with the corresponding captured spans,
as well as aggregate information over the captured
spans

Conclusion :

> We presented a system for neural extractive search.
While we found our system to be useful for scientific
search, it also has clear limitations and areas
for improvement, both in terms of accuracy (only
72.2% of the returned results are relevant, both the
alignment and similarity models generalize well to
some relations but not to others), and in terms of
scale

[Video of demo](https://www.youtube.com/watch?v=TtqWi2GgB5A&t=1832s)

2021-06-23 About

[1807.04905] Ultra-Fine Entity Typing

Tags:

2021-06-22 About

[2011.06993] FLERT: Document-Level Features for Named Entity Recognition

Tags:

2020-12-01 About

[1909.01259] Neural Attentive Bag-of-Entities Model for Text Classification

Tags:

A model that performs **text classification using entities in a knowledge base**.

> Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine **simple high-recall entity detection based on a dictionary** (word->list of entities), to detect entities in a document, with a novel neural **attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities**.

2 steps:

1. Entity detection
2. Classification using the detected entities (+text) as inputs

Regarding entity linking, a local model which uses cosine
similarity between the embedding of the target
entity and the word-based representation of
the document to capture the relevance of an entity
given a document.

Embeddings from the KB: computed using [#Wikipedia2Vec](tag:wikipedia2vec) (similar words and entities
close to one another in a unified vector space)

Model using attention, with 2 features :

- cosine similarity between the
embedding of the entity and the word based
representation of the document
- the probability that the entity
name refers to the entity in KB.

Somewhat [related](doc:2020/01/investigating_entity_knowledge_)

### Conclusion:

>a neural
network model that performs text classification using
entities in Wikipedia. We combined simple
dictionary-based entity detection with a neural attention
mechanism to enable the model to focus
on a small number of unambiguous and relevant
entities in a document.

2020-09-02 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[1909.04164] Knowledge Enhanced Contextual Word Representations

Tags:

2020-05-13 About

[1906.07241] Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Tags:

2020-05-11 About

[2004.05150] Longformer: The Long-Document Transformer

Tags:

2020-04-13 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[1911.05507] Compressive Transformers for Long-Range Sequence Modelling

Tags:

2020-02-11 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1909.03186] On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Tags:

2019-09-11 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About

[1903.05823] Deep Patent Landscaping Model Using Transformer and Graph Embedding

Tags:

2019-03-18 About

[1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Tags:

2019-01-11 About

[1706.03762] Attention Is All You Need

Tags:

2018-10-12 About

[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Tags:

2018-10-12 About

Structured Objects in OWL: Representation and Reasoning. In Proc. of the 17th Int. World Wide Web Conference (WWW 2008), Beijing

Tags:

Very good presentation at WWW 2008. Nominated for the best paper award<br/>
Abstract: Applications of semantic technologies often require the representation of and reasoning with structured objects—that is, objects composed of parts connected in complex ways. Although OWL is a general and powerful language, its class descriptions and axioms cannot be used to describe arbitrarily connected structures. An OWL representation of structured objects can thus be underconstrained, which reduces the inferences that can be drawn and causes performance problems in reasoning. To address these problems, we extend OWL with description graphs, which allow for the description of structured objects in a simple and precise way. To represent conditional aspects of the domain, we also allow for SWRL-like rules over description graphs. Based on an observation about the nature of structured objects, we ensure decidability of our formalism. We also present a hypertableau-based decision procedure, which we implemented in the HermiT reasoner. To evaluate its performance, we have extracted description graphs from the GALEN and FMA ontologies, classified them successfully, and even detected a modeling error in GALEN.

2008-04-23 About