The objective of embedding methods is to organize symbolic objects (e.g., words, entities, concepts) in a way such that their similarity in the embedding space reflects their semantic or functional similarity
[1806.05662] GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations(About) Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks.
[1803.11175] Universal Sentence Encoder (2018)(About) models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks.
> With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task
mixes an unsupervised task using a large corpus together with the supervised SNLI task, leveraging the [#Transformer](/tag/attention_is_all_you_need) architecture
Representations for Language: From Word Embeddings to Sentence Meanings (2017) - YouTube(About) [Slides](/doc/?uri=https%3A%2F%2Fnlp.stanford.edu%2Fmanning%2Ftalks%2FSimons-Institute-Manning-2017.pdf)
What's special about human language? the only hope for explainable intelligence.
Symbols are not just an invention of logic / classical AI.
Meaning: a solution via distributional similarity based representations. One of the most successfull ideas of modern NLP.
> You shall know a word by the company it keeps (JR Firth 1957)
The BiLSTM hegemony
Neural Bag of words
> "Surprisingly effective for many tasks :-(" [cf "DAN", Deep Averaging Network, Iyyver et al.](/doc/?uri=http%3A%2F%2Fwww.cs.cornell.edu%2Fcourses%2Fcs5740%2F2016sp%2Fresources%2Fdans.pdf)
Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning(About) Extraction de relations de corpus de textes de façon semi-supervisée, dans un contexte où on a peu de données labellisées décrivant les relations.
Par exemple, des données labellisées indiquent que le texte "Beijing, capital of China" correspond à la relation entre entités : ("Beijing", "Capital Of", "China), et on voudrait pouvoir extraire les entités et relations pertinentes à partir de texte tel que "Paris, France's capital,..."
Le papier décrit une méthode qui combine deux modules, l'un basé sur l'extraction automatique de patterns (par ex "[Head], Capital Of [Tail]") et l'autre sur la "sémantique distributionnelle" (du type "word embeddings"). Ces deux modules collaborent, le premier permettant de créer des instances de relations augmentant la base de connaissance sur lequel entrainer le second, et le second aidant le premier à déterminer des patterns informatifs ("co-entrainement")
Combining word and entity embeddings for entity linking (ESWC 2017)(About) The general approach for the entity linking task is to generate, for a given mention, a set of candidate entities from the base and, in a second step, determine which is the best
one. This paper proposes a novel method for the second step which is
based on the **joint learning of embeddings for the words in the text and
the entities in the knowledge base**.
Traversing Knowledge Graphs in Vector Space (2015)(About) Knowledge graphs often have missing facts (edges) which disrupts path queries. Recent models for knowledge base completion impute missing facts by embedding knowledge graphs in vector spaces. We show that these models can be recursively applied to answer path queries, but that they suffer from cascading errors. This motivates a new "compositional" training objective, which dramatically improves all models' ability to answer path queries, in some cases more than doubling accuracy.
Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning - YouTube(About) Lessons from neuroscience: one algorithm for all kinds of learning
Looking for better representations of the input (features)
Feature learning via sparse coding (sparse linear combinations. Edge detection, quantitatively similar to primary visual cortex)
Then learning features hierarchies (several layers. "sparse DBN" "deep belief nets")
Scaling see 25'07 (algos) ; using GPUs
Learning recursive representations. "Generic" hierarchies on text doesn't make sense; learn feature vector that represent sentences