[1802.07044] The Description Length of Deep Learning Models (2018)(About) > Solomonoff’s general theory of inference (Solomonoff, 1964) and the Minimum
Description Length principle (Grünwald, 2007; Rissanen, 2007) formalize Occam’s
razor, and hold that **a good model of data is a model that is good at losslessly
compressing the data, including the cost of describing the model itself**. Deep neural
networks might seem to go against this principle given the large number of
parameters to be encoded.
We demonstrate experimentally the ability of deep neural networks to compress
the training data even when accounting for parameter encoding.
Information theory and neural coding (1999) (Alexander Borst and Frédéric E. Theunissen)(About) > we review information-theory basics before
demonstrating its use in neural coding. We show how to use information theory to validate simple stimulus–
response models of neural coding of dynamic stimuli. Because these models require specification of
spike timing precision, they can reveal which time scales contain information in neural coding. This
approach shows that dynamic stimuli can be encoded efficiently by single neurons and that each spike
contributes to information transmission. We argue, however, that the data obtained so far do not suggest
a temporal code, in which the placement of spikes relative to each other yields additional information
[1910.03524] Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs (2019)(About) > In this paper, we aim to eliminate the inductive bias imposed by the embedding space geometry. Namely, we propose to map data into more general non-vector metric spaces: a weighted graph with a shortest path distance. By design, such graphs can model arbitrary geometry with a proper configuration of edges and weights. Our main contribution is PRODIGE (Probabilistic Differentiable Graph Embeddings): a method that learns a weighted graph representation of data end-to-end by gradient descent.
Feature-wise transformations. A simple and surprisingly effective family of conditioning mechanisms. (2018)(About) > Many real-world problems require integrating multiple sources of information...When approaching such problems, it often makes sense to process one source of information in the context of another. In machine learning, we often refer to this context-based processing as conditioning: the computation carried out by a model is **conditioned** or **modulated** by information extracted from an auxiliary input. Eg.: **extract meaning from the image in the context of the question**.
Related to this talk at Paris NLP meetup: ["Language and Perception in Deep Learning"](/doc/2019/10/language_and_perception_in_deep)