Efficient compression in color naming and its evolution(About) The Information Bottleneck principle applied toLinguistics.
>We argue that **languages efficiently compress ideas into words by optimizing the information bottleneck trade-off** between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming.
word meanings may reflect adaptation to pressure for efficient communication—
that is, communication that is precise yet requires only minimal
[1908.01580] The HSIC Bottleneck: Deep Learning without Back-Propagation (2019)(About) > we show that it is possible to learn classification tasks at near competitive accuracy **without
backpropagation**, by maximizing a surrogate of the mutual information between hidden representations and labels and
simultaneously minimizing the mutual dependency between hidden representations and the inputs...
the hidden units of a network trained in this way form useful representations. Specifically, fully competitive accuracy
can be obtained by freezing the network trained without backpropagation and appending and training a one-layer
network using conventional SGD to convert convert the representation to the desired format.
The training method uses an approximation of the [#information bottleneck](/tag/information_bottleneck_method).
[1503.02406] Deep Learning and the Information Bottleneck Principle (2015)(About) Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN.
[physics/0004057] The information bottleneck method (1999)(About) > We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y . Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. **Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y.** That is, we squeeze the information that X provides about Y through a ‘bottleneck’ formed by a limited set of codewords X ̃... This approach yields an exact set of self consistent equations for the coding rules X → X ̃ and X ̃ → Y .
(from the intro) : how to define "meaningful / relevant" information? An issue left out of information theory by Shannon (focus on the problem of transmitting information rather than judging its value to the recipient) ->leads to
consider statistical and information theoretic principles as almost irrelevant
for the question of meaning.
> In contrast, **we argue here that information theory,
in particular lossy source compression, provides a natural quantitative
approach to the question of “relevant information.” Specifically, we formulate
a **variational principle** for the extraction or efficient representation of
4th Workshop on Representation Learning for NLP(About) Talks:
- Language emergence as representation learning (Marco Baroni)
> language emergence among deep neural network agents that have to jointly solve a task. Recent findings suggest that the language-like code developed by such agents both differs from and resembles natural language in interesting ways. For example, the emergent code does not naturally represent general concepts, but rather very specific invariances in the perceptual input
- Representations shaped by dialogue interaction (Raquel Fernández)
> When we use language to communicate with each other in conversation, we build an internal representation of our evolving common ground. Traditionally, in dialogue systems this is captured by an explicit dialogue state defined a priori. Can we develop dialogue agents that learn their own (joint) representations?
- Knowledgeable and Adversarially-Robust Representation Learning (Mohit Bansal)
- Modeling Output Spaces in Continuous-Output Language Generation (Yulia Tsvetkov)
Learning Text Similarity with Siamese Recurrent Networks(About) A deep architecture for
**learning a similarity metric** on variable length
character sequences. The model
combines a stack of character-level bidirectional
LSTM’s with a Siamese architecture.
It learns to project variable length
strings into a fixed-dimensional embedding
space **by using only information
about the similarity between pairs of
strings**. This model is applied to the task
of job title normalization based on a manually
annotated taxonomy. A small data set
is incrementally expanded and augmented
with new sources of variance.
Reasoning With Neural Tensor Networks for Knowledge Base Completion (2013)(About) **Predicting the likely truth of additional facts based on existing facts in the knowledge base.**
> we introduce an expressive neural
tensor network suitable for reasoning over relationships between two entities.
Most similar work: [Bordes et al.](http://127.0.0.1:8080/semanlink/doc/2019/08/learning_structured_embeddings_) (2011)
1. new neural tensor
network (NTN) suitable for reasoning over relationships between two entities.. Generalizes several previous neural network models and provides a more
powerful way to model relational information than a standard neural network layer.
2. a new way to represent entities in knowledge bases, as the
average of their constituting word vectorss, allowing the sharing of statistical strength between the words describing
each entity (e.g., Bank of China and China).
3. incorporation of word vectors which are trained on large unlabeled text
> We learn to modify word representations
via grounding in world knowledge. This essentially allows us to analyze word embeddings and
query them for specific relations. Furthermore, the resulting vectors could be used in other tasks
such as named entity recognition or relation classification in natural language
Bloom filter(About) To test whether an element is a member of a set. False positive are possible, but false negatives are not (a query returns either "possibly in set" or "definitely not in set")
A2N: Attending to Neighbors for Knowledge Graph Inference - ACL 2019(About) > State-of-the-art models for knowledge graph completion aim at learning a fixed embedding representation of entities in a multi-relational graph which can generalize to infer unseen entity relationships at test time. This can be sub-optimal as it requires memorizing and generalizing to all possible entity relationships using these fixed representations. We thus propose a novel **attention-based method to learn query-dependent representation of entities** which adaptively combines the relevant graph neighborhood of an entity leading to more accurate KG completion.