[1806.05662] GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations(About) Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks.
Human-level concept learning through probabilistic program induction (2015)(About) > People learning new concepts can often generalize successfully from just a single example, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy... We present a computational model that captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets
Semantic hashing (2008) - Ruslan Salakhutdinov, Geoffrey Hinton(About) > We show how to learn a deep graphical model of the word-count vectors obtained from a
large set of documents. The values of the latent variables in the deepest layer are easy to
infer and give a much better representation of each document than Latent Semantic Analysis.
When the deepest layer is forced to use a small number of binary variables (e.g. 32),
the graphical model performs ‘‘semantic hashing”: Documents are mapped to memory
addresses in such a way that semantically similar documents are located at nearby
addresses. Documents similar to a query document can then be found by simply accessing
all the addresses that differ by only a few bits from the address of the query document. This
way of extending the efficiency of hash-coding to approximate matching is much faster
than locality sensitive hashing, which is the fastest current method. By using semantic
hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying
TF-IDF to the entire document set.
Indexing is implemented in the following manner: a document is mapped to a word-count vector and then this vector is passed through a [#Restricted Boltzmann Machine](/tag/restricted_boltzmann_machine) autoencoder and encoded to 32-bit address.
A Comparative Study of Word Embeddings for Reading Comprehension(About) abstract:
The focus of past machine learning research for Reading Comprehension tasks has been primarily on the design of novel deep learning architectures. Here we show that seemingly minor choices made on
1. the use of pre-trained word embeddings, and
2. the representation of out- of-vocabulary tokens at test time,
can turn out to have a larger impact than architectural choices on the final performance