Designed for finding the best tradeoff between accuracy and complexity (compression) when summarizing (e.g. clustering) a random variable X, given a joint probability distribution p(X,Y) between X and an observed relevant variable Y.
Intuitively, the IB principle preserves the information of the hidden representations about the label while compressing information about the input data. ([source](/doc/2019/08/_1908_01580_the_hsic_bottlenec))
Efficient compression in color naming and its evolution(About) The Information Bottleneck principle applied to linguistics.
>We argue that **languages efficiently compress ideas into words by optimizing the information bottleneck trade-off** between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming.
word meanings may reflect adaptation to pressure for efficient communication—
that is, communication that is precise yet requires only minimal
[1908.01580] The HSIC Bottleneck: Deep Learning without Back-Propagation (2019)(About) > we show that it is possible to learn classification tasks at near competitive accuracy **without
backpropagation**, by maximizing a surrogate of the mutual information between hidden representations and labels and
simultaneously minimizing the mutual dependency between hidden representations and the inputs...
the hidden units of a network trained in this way form useful representations. Specifically, fully competitive accuracy
can be obtained by freezing the network trained without backpropagation and appending and training a one-layer
network using conventional SGD to convert convert the representation to the desired format.
The training method uses an approximation of the [#information bottleneck](/tag/information_bottleneck_method).
> - The method facilitates parallel processing and requires significantly less operations.
> - It does not suffer from exploding or vanishing gradients.
> - It is biologically more plausible than Backpropagation
[1503.02406] Deep Learning and the Information Bottleneck Principle (2015)(About) > Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN.
[physics/0004057] The information bottleneck method (1999)(About) > We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y . Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. **Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y.** That is, we squeeze the information that X provides about Y through a ‘bottleneck’ formed by a limited set of codewords X ̃... This approach yields an exact set of self consistent equations for the coding rules X → X ̃ and X ̃ → Y .
(from the intro) : how to define "meaningful / relevant" information? An issue left out of information theory by Shannon (focus on the problem of transmitting information rather than judging its value to the recipient) ->leads to
consider statistical and information theoretic principles as almost irrelevant
for the question of meaning.
> In contrast, **we argue here that information theory,
in particular lossy source compression, provides a natural quantitative
approach to the question of “relevant information.”** Specifically, we formulate
a **variational principle** for the extraction or efficient representation of