Semanlink - [1802.07044] The Description Length of Deep Learning Models

[1802.07044] The Description Length of Deep Learning Models

Tags:

About This Document

sl:arxiv_author :
- Léonard Blier
- Yann Ollivier
sl:arxiv_firstAuthor : Léonard Blier
sl:arxiv_num : 1802.07044
sl:arxiv_published : 2018-02-20T10:15:26Z
sl:arxiv_summary : Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks. Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff's approach.@en
sl:arxiv_title : The Description Length of Deep Learning Models@en
sl:arxiv_updated : 2018-11-01T11:23:09Z
sl:bookmarkOf : https://arxiv.org/abs/1802.07044
sl:creationDate : 2019-10-11
sl:creationTime : 2019-10-11T01:59:35Z

File info

Bookmark of: https://arxiv.org/abs/1802.07044

Documents with similar tags (experimental)

[2012.15156] A Memory Efficient Baseline for Open Domain Question Answering

Tags:

2022-08-08 About

[2208.03299] Few-shot Learning with Retrieval Augmented Language Model

Tags:

2022-08-08 About

[2112.09118] Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Tags:

2021-12-21 About

[2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering

Tags:

> a method to train an information retrieval module for downstream tasks, **without using pairs of queries and documents as annotations**.

Uses two models (standard pipeline for open-domain QA):

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About

Tags: