A generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.
Models the intuition that the topic of a document will probabilistically influence the author’s choice of words when writing the document. Documents are interpreted as a mixture of topics (a probability distribution over topics), and topics as a probability distribution over words.
Encodes the intuition that documents cover a small number of topics and that topics often use a small number of words
LDA is an extension of [LSI/pLSI](latent_semantic_analysis)
Provable Algorithms for Machine Learning Problems by Rong Ge. from the abstract:
Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NP-hard problems in average case using heuristics. What properties of the input allow it to be solved effciently? Theoretically analyzing the heuristics is very challenging. Few results were known.
This thesis takes a different approach: we identify natural properties of the input, then design new algorithms that provably works assuming the input has these properties. We are able to give new, provable and sometimes practical algorithms for learning tasks related to text corpus, images and social networks.
...In theory, the assumptions in this thesis help us understand why intractable problems in machine learning can often be solved; in practice, the results suggest inherently new approaches for machine learning.