Dictionary learning, or sparse coding, tries to learn a sparse linear code to represent the
given data succinctly.
Unsupervised learning algo. Images -> edge detection (similar to primary visual cortex)
[1601.03764] Linear Algebraic Structure of Word Senses, with Applications to Polysemy (2016 - revised 2018)(About) > Here it is shown that multiple word senses reside
in linear superposition within the word
embedding and simple sparse coding can recover
vectors that approximately capture the
> Each extracted word sense is accompanied by one of about 2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense.
> The success of the approach is mathematically explained using a variant of
the random walk on discourses model
("random walk": a generative model for language). Under the assumptions of this model, there
exists a linear relationship between the vector of a
word w and the vectors of the words in its contexts (It is not the average of the words in w's context, but in a given corpus the matrix of the linear relationship does not depend on w. It can be estimated, and so we can compute the embedding of a word from the contexts it belongs to)
[Related blog post](/doc/?uri=https%3A%2F%2Fwww.offconvex.org%2F2016%2F07%2F10%2Fembeddingspolysemy%2F)
Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning - YouTube(About) Lessons from neuroscience: one algorithm for all kinds of learning
Looking for better representations of the input (features)
Feature learning via sparse coding (sparse linear combinations. Edge detection, quantitatively similar to primary visual cortex)
Then learning features hierarchies (several layers. "sparse DBN" "deep belief nets")
Scaling see 25'07 (algos) ; using GPUs
Learning recursive representations. "Generic" hierarchies on text doesn't make sense; learn feature vector that represent sentences
Provable Algorithms for Machine Learning Problems by Rong Ge.(About) from the abstract:
Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NP-hard problems in average case using heuristics. What properties of the input allow it to be solved effciently? Theoretically analyzing the heuristics is very challenging. Few results were known.
This thesis takes a different approach: we identify natural properties of the input, then design new algorithms that provably works assuming the input has these properties. We are able to give new, provable and sometimes practical algorithms for learning tasks related to text corpus, images and social networks.
...In theory, the assumptions in this thesis help us understand why intractable problems in machine learning can often be solved; in practice, the results suggest inherently new approaches for machine learning.