Semantic hashing using tags and topic modeling (2013)(About) Semantic Hashing using Tags and Topic Modeling, to incorporate both the tag information and the similarity information from probabilistic topic modeling. [Comments about the paper](https://sutheeblog.wordpress.com/2016/10/28/paper-reading-semantic-hashing-using-tags-and-topic-modeling-sigir13/). [Code on Github](https://github.com/zhuoxiongzhao/code-for-SHTTM)
Provable Algorithms for Machine Learning Problems by Rong Ge.(About) from the abstract:
Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NP-hard problems in average case using heuristics. What properties of the input allow it to be solved effciently? Theoretically analyzing the heuristics is very challenging. Few results were known.
This thesis takes a different approach: we identify natural properties of the input, then design new algorithms that provably works assuming the input has these properties. We are able to give new, provable and sometimes practical algorithms for learning tasks related to text corpus, images and social networks.
...In theory, the assumptions in this thesis help us understand why intractable problems in machine learning can often be solved; in practice, the results suggest inherently new approaches for machine learning.
Topic modeling with network regularization(About) In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method combines topic modeling and social network analysis, and leverages the power of both statistical topic models and discrete regularization. The output of this model can summarize well topics in text, map a topic onto the network, and discover topical communities.
Probabilistic Topic Models(About) The LSA approach makes three claims: that semantic information can be derived from a word-document co-occurrence matrix; that dimensionality reduction is an essential part of this derivation; and that words and documents can be represented as points in Euclidean space. Topic models' approach is consistent with the first two of these claims, but differs in the third, describing a class of statistical models in which the semantic properties of words and documents are expressed in terms of probabilistic topics.