Semantic Hashing
http://www.semanlink.net/tag/semantic_hashing
Documents tagged with Semantic Hashing[1810.07150] Subword Semantic Hashing for Intent Classification on Small Datasets
https://arxiv.org/abs/1810.07150
2018-10-22T14:23:00ZA Theoretical Approach to Semantic Coding and Hashing | Simons Institute for the Theory of Computing (2016)
https://simons.berkeley.edu/talks/sanjeev-arora-2016-11-15
2018-05-26T17:22:33ZSemantic hashing using tags and topic modeling (2013)
https://www.semanticscholar.org/paper/Semantic-hashing-using-tags-and-topic-modeling-Wang-Zhang/1a0f660f70fd179003edc271694736baaa39dec4
Semantic Hashing using Tags and Topic Modeling, to incorporate both the tag information and the similarity information from probabilistic topic modeling. [Comments about the paper](https://sutheeblog.wordpress.com/2016/10/28/paper-reading-semantic-hashing-using-tags-and-topic-modeling-sigir13/). [Code on Github](https://github.com/zhuoxiongzhao/code-for-SHTTM)
2018-03-22T00:41:03ZLearning Deep Structured Semantic Models for Web Search using Clickthrough Data - Microsoft Research (2013)
https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-web-search-using-clickthrough-data/
we strive to develop a series of **new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space** where the relevance of a document given a query is readily computed as the distance between them. The proposed deep structured semantic models are discriminatively trained by maximizing the conditional likelihood of the clicked documents given a query using the clickthrough data. To make our models applicable to large-scale Web search applications, we also use a technique called word hashing
2017-12-30T02:10:49Z[1712.01208] The Case for Learned Index Structures
https://arxiv.org/abs/1712.01208v1
> we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs
>
> Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes.
2017-12-11T19:25:09ZSemantic Hashing [9 mins] - Université de Toronto | Coursera
https://fr.coursera.org/learn/neural-networks/lecture/s7bmT/semantic-hashing-9-mins
2017-11-07T14:40:31ZSemantic hashing (2008) - Ruslan Salakhutdinov, Geoffrey Hinton
http://www.sciencedirect.com/science/article/pii/S0888613X08001813
> We show how to learn a deep graphical model of the word-count vectors obtained from a
large set of documents. The values of the latent variables in the deepest layer are easy to
infer and give a much better representation of each document than Latent Semantic Analysis.
When the deepest layer is forced to use a small number of binary variables (e.g. 32),
the graphical model performs ‘‘semantic hashing”: Documents are mapped to memory
addresses in such a way that semantically similar documents are located at nearby
addresses. Documents similar to a query document can then be found by simply accessing
all the addresses that differ by only a few bits from the address of the query document. This
way of extending the efficiency of hash-coding to approximate matching is much faster
than locality sensitive hashing, which is the fastest current method. By using semantic
hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying
TF-IDF to the entire document set.
Indexing is implemented in the following manner: a document is mapped to a word-count vector and then this vector is passed through a [#Restricted Boltzmann Machine](/tag/restricted_boltzmann_machine) autoencoder and encoded to 32-bit address.
2017-11-07T11:54:38Z[1004.5370] Self-Taught Hashing for Fast Similarity Search
https://arxiv.org/pdf/1004.5370.pdf
Emphasise following issue in Semantic Hashing: obtaining the codes for previously unseen documents. Propose following approach:
first find the optimal l-bit binary codes for all documents in
the given corpus via unsupervised learning, then train
l classifiers via supervised learning to predict the l-bit code
for any query document unseen before.
(méthode résumée [ici](https://www.semanticscholar.org/paper/Semantic-hashing-using-tags-and-topic-modeling-Wang-Zhang/1a0f660f70fd179003edc271694736baaa39dec4))
2017-11-07T11:48:17Z