Semanlink - Geoffrey Hinton

> We show how to learn a deep graphical model of the word-count vectors obtained from a
large set of documents. The values of the latent variables in the deepest layer are easy to
infer and give a much better representation of each document than Latent Semantic Analysis.
When the deepest layer is forced to use a small number of binary variables (e.g. 32),
the graphical model performs ‘‘semantic hashing”: Documents are mapped to memory
addresses in such a way that semantically similar documents are located at nearby
addresses. Documents similar to a query document can then be found by simply accessing
all the addresses that differ by only a few bits from the address of the query document. This
way of extending the efficiency of hash-coding to approximate matching is much faster
than locality sensitive hashing, which is the fastest current method. By using semantic
hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying
TF-IDF to the entire document set.

Indexing is implemented in the following manner: a document is mapped to a word-count vector and then this vector is passed through a [#Restricted Boltzmann Machine](/tag/restricted_boltzmann_machine) autoencoder and encoded to 32-bit address.

2017-11-07 About

Heroes of Deep Learning: Andrew Ng interviews Geoffrey Hinton - YouTube

Tags:

2017-08-16 About

Properties

sl:creationDate : 2017-08-16
sl:creationTime : 2017-08-16T10:29:12Z
sl:describedBy : https://en.wikipedia.org/wiki/Geoffrey_Hinton
rdf:type : sl:Tag
skos:prefLabel : Geoffrey Hinton@fr