]>
Unsupervised machine learning
Tutorial
Occam's razor
Information theory
the best hypothesis (a model and its parameters) for a given set of data is the one that leads to the best compression of the data.
> In information theory and Minimum Description Length (MDL), learning a good model of the data
is recast as using the model to losslessly transmit the data in as few bits as possible. ([source](/doc/2019/10/_1802_07044_the_description_le))
Minimum Description Length Principle
Information theory AND Deep Learning
This tutorial Unsupervised Deep Learning will cover in detail, the approach to simply 'predict everything' in the data, typically with a probabilistic model, which can be seen through the lens of the Minimum Description Length principle as an effort to compress the data as compactly as possible.
2018-12-06
Neural Information Processing Systems - Tutorial Sessions: Unsupervised Deep Learning "predict everything"
2019-10-11
Facebook FAIR
> Solomonoff’s general theory of inference (Solomonoff, 1964) and the [Minimum Description Length Principle](tag:minimum_description_length_principle) (Grünwald, 2007; Rissanen, 2007) formalize [Occam's razor](tag:occam_s_razor), and hold that **a good model of data is a model that is good at losslessly
compressing the data, including the cost of describing the model itself**. Deep neural
networks might seem to go against this principle given the large number of
parameters to be encoded.
We demonstrate experimentally the ability of deep neural networks to compress
the training data even when accounting for parameter encoding.
NLP@ENS
Overfitting/Generalization
[1802.07044] The Description Length of Deep Learning Models
DL: why does it work?
Arxiv Doc