Semanlink - [2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

[2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Aditya Prakash
sl:arxiv_num : 2104.09224
sl:arxiv_published : 2021-04-19T11:48:13Z
sl:arxiv_summary : How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise for perception tasks such as object detection and motion forecasting. However, for the actual driving task, the global context of the 3D scene is key, e.g. a change in traffic light state can affect the behavior of a vehicle geometrically distant from that traffic light. Geometry alone may therefore be insufficient for effectively fusing representations in end-to-end driving models. In this work, we demonstrate that imitation learning policies based on existing sensor fusion methods under-perform in the presence of a high density of dynamic agents and complex scenarios, which require global contextual reasoning, such as handling traffic oncoming from multiple directions at uncontrolled intersections. Therefore, we propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. We experimentally validate the efficacy of our approach in urban settings involving complex scenarios using the CARLA urban driving simulator. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.@en
sl:arxiv_title : Multi-Modal Fusion Transformer for End-to-End Autonomous Driving@en
sl:arxiv_updated : 2021-04-19T11:48:13Z
sl:bookmarkOf : https://arxiv.org/abs/2104.09224
sl:creationDate : 2022-09-16
sl:creationTime : 2022-09-16T19:03:51Z

File info

Bookmark of: https://arxiv.org/abs/2104.09224

Documents with similar tags (experimental)

[2307.02486] LongNet: Scaling Transformers to 1,000,000,000 Tokens

Tags:

2023-07-06 About

[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Tags:

2023-07-01 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2302.01398] The unreasonable effectiveness of few-shot learning for machine translation

Tags:

2023-02-07 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers

Tags:

2022-01-29 About

[2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Tags:

2021-10-13 About

[2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond

Tags:

2021-07-09 About

[2011.06993] FLERT: Document-Level Features for Named Entity Recognition

Tags:

2020-12-01 About

[1909.01259] Neural Attentive Bag-of-Entities Model for Text Classification

Tags:

A model that performs **text classification using entities in a knowledge base**.

> Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine **simple high-recall entity detection based on a dictionary** (word->list of entities), to detect entities in a document, with a novel neural **attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities**.

2 steps:

1. Entity detection
2. Classification using the detected entities (+text) as inputs

Regarding entity linking, a local model which uses cosine
similarity between the embedding of the target
entity and the word-based representation of
the document to capture the relevance of an entity
given a document.

Embeddings from the KB: computed using [#Wikipedia2Vec](tag:wikipedia2vec) (similar words and entities
close to one another in a unified vector space)

Model using attention, with 2 features :

- cosine similarity between the
embedding of the entity and the word based
representation of the document
- the probability that the entity
name refers to the entity in KB.

Somewhat [related](doc:2020/01/investigating_entity_knowledge_)

### Conclusion:

>a neural
network model that performs text classification using
entities in Wikipedia. We combined simple
dictionary-based entity detection with a neural attention
mechanism to enable the model to focus
on a small number of unambiguous and relevant
entities in a document.

2020-09-02 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

2020-02-17 About

[1911.05507] Compressive Transformers for Long-Range Sequence Modelling

Tags:

2020-02-11 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1909.03186] On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Tags:

2019-09-11 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About

[1903.05823] Deep Patent Landscaping Model Using Transformer and Graph Embedding

Tags:

2019-03-18 About

[1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Tags:

2019-01-11 About

[1706.03762] Attention Is All You Need

Tags:

2018-10-12 About

[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Tags:

2018-10-12 About