Semanlink - [physics/0004057] The information bottleneck method

Tags:

> We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. **Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y.** That is, we squeeze the information that X provides about Y through a ‘bottleneck’ formed by a limited set of codewords X ̃... This approach yields an exact set of self consistent equations for the coding rules X → X ̃ and X ̃ → Y .

(from the intro) : how to define "meaningful / relevant" information? An issue left out of information theory by Shannon (focus on the problem of transmitting information rather than judging its value to the recipient) ->leads to
consider statistical and information theoretic principles as almost irrelevant
for the question of meaning.

> In contrast, **we argue here that information theory,
in particular lossy source compression, provides a natural quantitative
approach to the question of “relevant information.”** Specifically, we formulate
a **variational principle** for the extraction or efficient representation of
relevant information.

About This Document

sl:arxiv_author :
- Naftali Tishby Hebrew University and NEC Research Institute
- Fernando C. Pereira ATT Shannon Laboratory
- William Bialek NEC Research Institute
sl:arxiv_firstAuthor : Naftali Tishby Hebrew University and NEC Research Institute
sl:arxiv_num : physics/0004057
sl:arxiv_published : 2000-04-24T15:22:30Z
sl:arxiv_summary : We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.@en
sl:arxiv_title : The information bottleneck method@en
sl:arxiv_updated : 2000-04-24T15:22:30Z
sl:bookmarkOf : https://arxiv.org/abs/physics/0004057
sl:creationDate : 2019-08-15
sl:creationTime : 2019-08-15T11:31:33Z

File info

Bookmark of: https://arxiv.org/abs/physics/0004057

Documents with similar tags (experimental)

[1503.02406] Deep Learning and the Information Bottleneck Principle

Tags:

2019-08-15 About