]> 2019-08-25 Sébastien Castellion 2019-08-25T19:36:23Z > tuer un homme ce n’est pas défendre une doctrine, c’est tuer un homme. (A l'adresse de Calvin, suite à la condamnation de Servet) 1er à prêcher la tolérance en Europe, avant Locke et Hume. 2019-08-09 2019-08-09T10:25:24Z An easy introduction to Pytorch for Neural Networks Neural Models for Information Retrieval (2017) 2019-08-18T23:00:09Z 2019-08-18 physics/0004057 2019-08-15T11:31:33Z The information bottleneck method Naftali Tishby Hebrew University and NEC Research Institute Fernando C. Pereira ATT Shannon Laboratory 2019-08-15 2000-04-24T15:22:30Z Naftali Tishby Hebrew University and NEC Research Institute We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere. 2000-04-24T15:22:30Z [physics/0004057] The information bottleneck method William Bialek NEC Research Institute > We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. **Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y.** That is, we squeeze the information that X provides about Y through a ‘bottleneck’ formed by a limited set of codewords X ̃... This approach yields an exact set of self consistent equations for the coding rules X → X ̃ and X ̃ → Y . (from the intro) : how to define "meaningful / relevant" information? An issue left out of information theory by Shannon (focus on the problem of transmitting information rather than judging its value to the recipient) ->leads to consider statistical and information theoretic principles as almost irrelevant for the question of meaning. > In contrast, **we argue here that information theory, in particular lossy source compression, provides a natural quantitative approach to the question of “relevant information.”** Specifically, we formulate a **variational principle** for the extraction or efficient representation of relevant information. benedekrozemberczki/awesome-graph-classification: A collection of important graph embedding, classification and representation learning papers with implementations. 2019-08-05 2019-08-05T23:20:38Z Efficient compression in color naming and its evolution 2019-08-15 2019-08-15T17:39:48Z The Information Bottleneck principle applied to linguistics. >We argue that **languages efficiently compress ideas into words by optimizing the information bottleneck trade-off** between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming. word meanings may reflect adaptation to pressure for efficient communication— that is, communication that is precise yet requires only minimal cognitive resources. Absurd Creature of the Week: The Parasitic Worm That Turns Snails Into Disco Zombies | WIRED 2019-08-13 2019-08-13T09:10:29Z I’m a journalist. Monsanto built a step-by-step strategy to destroy my reputation | Carey Gillam | Opinion | The Guardian 2019-08-12 2019-08-12T17:43:56Z 2019-08-05 Xu Han Zhiyuan Liu Maosong Sun ERNIE: Enhanced Language Representation with Informative Entities Qun Liu Zhengyan Zhang 2019-08-05T15:40:17Z 2019-05-17T06:24:16Z 1905.07129 [1905.07129] ERNIE: Enhanced Language Representation with Informative Entities > We argue that informative entities in **KGs can enhance language representation with external knowledge**. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. > ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks [GitHub](https://github.com/thunlp/ERNIE) WARNING, there is another ERNIE (by [NLP@Baidu](tag:nlp_baidu)): Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. Ernie: Enhanced representation through knowledge integration. This doesn't happen when you choose François-Paul as the name for your child. Xin Jiang 2019-06-04T11:35:58Z Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code of this paper can be obtained from https://github.com/thunlp/ERNIE. Zhengyan Zhang Knowledge Graph Reasoning Papers 2019-08-09 2019-08-09T16:59:43Z 2019-08-15T12:39:39Z France is AI 2018: Lenka Zdeborova - Statistical physics modelling of machine learning - YouTube 2019-08-15 > In data science, models are used to fit the data. In physics, models are the main tools for understanding 2019-08-21T22:05:00Z 2019-08-21 Peter Bloem [slides](https://docs.google.com/presentation/d/1fIhGikFPnb7G5kr58OvYC3GN4io7MznnM0aAgadvJfc/edit#slide=id.g5888218f39_177_4) The State of Transfer Learning in NLP (2019) 2019-08-19T16:30:32Z 2019-08-19 2019-08-28T22:47:20Z 2019-08-28 Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT 2019-08-09T16:07:48Z 2019-08-09 How Mosquitoes Helped Shape the Course of Human History | History | Smithsonian Les mégalithes de Veyre-Monton (Puy-de-Dôme) : ... | Inrap 2019-08-29 2019-08-29T20:58:36Z Starlette 2019-08-11 2019-08-11T23:58:35Z lightweight ASGI framework/toolkit, for building asyncio services What is XLNet and why it outperforms BERT - Towards Data Science 2019-08-02T17:46:14Z 2019-08-02 2019-08-21T13:11:32Z 2019-08-21 Transformers from scratch | Peter Bloem The best explanation about the transformer. Code included. > Self-attention is a sequence-to-sequence operation: a sequence of t vectors goes in, and a sequence of t vectors comes out (all vectors with same size). > > To produce output vector 𝐲i, the self attention operation simply takes a weighted average over all the input vectors > > 𝐲i=∑jwij𝐱j. > > Where the weights sum to one over all j. The weight wij is not a parameter, as in a normal neural net, but it is derived from a function over 𝐱i and 𝐱j. The simplest option for this function is the dot product. a discussion on the potential of knowledge graphs for end-to-end learning and on the challenges of this approach 2019-08-22 2019-08-22T10:49:49Z The knowledge graph as the default data model for learning on heterogeneous knowledge (2017) 2019-08-05T10:48:06Z 2019-08-05 Lesson 3 - Self Driving Cars 2019-08-23T00:43:51Z 2019-08-23 Graph Transformer | OpenReview 2019-08-11T23:56:22Z 2019-08-11 Web Applications & Frameworks — The Hitchhiker's Guide to Python [1904.02342] Text Generation from Knowledge Graphs with Graph Transformers Hannaneh Hajishirzi 2019-08-23 2019-05-18T01:07:52Z 2019-04-04T04:33:15Z Rik Koncel-Kedziorski 1904.02342 Mirella Lapata Dhanush Bekal 2019-08-23T00:39:46Z Yi Luan Text Generation from Knowledge Graphs with Graph Transformers Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce. In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph. Graphical knowledge representations are ubiquitous in computing, but pose a significant challenge for text generation techniques due to their non-hierarchical nature, collapsing of long-distance dependencies, and structural variety. We introduce a novel graph transforming encoder which can leverage the relational structure of such knowledge graphs without imposing linearization or hierarchical constraints. Incorporated into an encoder-decoder setup, we provide an end-to-end trainable system for graph-to-text generation that we apply to the domain of scientific text. Automatic and human evaluations show that our technique produces more informative texts which exhibit better document structure than competitive encoder-decoder methods. Rik Koncel-Kedziorski Implementation for this [paper](/doc/2019/07/_1907_05242_large_memory_layer) 2019-08-30T13:38:58Z 2019-08-30 Product-Key Memory (PKM) Minimalist implementation of a Product-Key Memory layer 2019-08-12T10:03:41Z 2019-08-12 4th Workshop on Representation Learning for NLP Talks: - Language emergence as representation learning (Marco Baroni) > language emergence among deep neural network agents that have to jointly solve a task. Recent findings suggest that the language-like code developed by such agents both differs from and resembles natural language in interesting ways. For example, the emergent code does not naturally represent general concepts, but rather very specific invariances in the perceptual input - Representations shaped by dialogue interaction (Raquel Fernández) > When we use language to communicate with each other in conversation, we build an internal representation of our evolving common ground. Traditionally, in dialogue systems this is captured by an explicit dialogue state defined a priori. Can we develop dialogue agents that learn their own (joint) representations? - Knowledgeable and Adversarially-Robust Representation Learning (Mohit Bansal) - Modeling Output Spaces in Continuous-Output Language Generation (Yulia Tsvetkov) 2019-08-05 Knowledge graphs in Natural Language Processing @ ACL 2019 - Michael Galkin 2019-08-05T14:23:54Z - Dialogue Systems over KGs - Natural Language Generation of KG facts - Complex QA over KGs - KG Embeddings & Graph Representations 2019-08-03 2019-08-03T09:48:02Z A la recherche de la lionne de Nimroud 2019-08-31T16:59:48Z 2019-08-31 En Inde, près de deux millions de citoyens, la plupart musulmans, déchus de leur nationalité A dive into spatial search algorithms 2019-08-29 2019-08-29T00:34:39Z Chip Huyen sur Twitter : "This thread is a combination of 10 free online courses on machine learning that I find the most helpful" 2019-08-04 2019-08-04T12:17:22Z Neural Knowledge Acquisition via Mutual Attention between Knowledge Graph and Text (2018) 2019-08-23 2019-08-23T00:28:34Z Géologie - Normandie - Massif d'Ecouves (NS) - Pleurodictyum constantinopolitatum 2019-08-10T13:13:33Z 2019-08-10 Most state-of-the-art models for named entity recognition (NER) rely on the availability of large amounts of labeled data, making them challenging to extend to new, lower-resourced languages. However, there are now several proposed approaches involving either cross-lingual transfer learning, which learns from other highly resourced languages, or active learning, which efficiently selects effective training data based on model predictions. This paper poses the question: given this recent progress, and limited human annotation, what is the most effective method for efficiently creating high-quality entity recognizers in under-resourced languages? Based on extensive experimentation using both simulated and real human annotation, we find a dual-strategy approach best, starting with a cross-lingual transferred model, then performing targeted annotation of only uncertain entity spans in the target language, minimizing annotator effort. Results demonstrate that cross-lingual transfer is a powerful tool when very little data can be annotated, but an entity-targeted annotation strategy can achieve competitive accuracy quickly, with just one-tenth of training data. Zaid Sheikh 2019-08-23T19:15:07Z Aditi Chaudhary 2019-08-28 1908.08983 A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers 2019-08-23T19:15:07Z Jiateng Xie [1908.08983] A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers Aditi Chaudhary 2019-08-28T22:57:43Z Graham Neubig Jaime G. Carbonell 2019-08-21T08:35:18Z 2019-08-21 Blackstone Concept Extractor — ICLR&D Learning Text Similarity with Siamese Recurrent Networks (2016) 2019-08-07T02:01:44Z 2019-08-07 A deep architecture for **learning a similarity metric** on variable length character sequences. The model combines a stack of character-level bidirectional LSTM’s with a Siamese architecture. It learns to project variable length strings into a fixed-dimensional embedding space **by using only information about the similarity between pairs of strings**. This model is applied to the task of job title normalization based on a manually annotated taxonomy. A small data set is incrementally expanded and augmented with new sources of variance. from the conclusion: The experiment shows that the explicit use of prior knowledge to add these sources of invariance to the system was crucial in learning. Without this knowledge extra words and synonyms will negatively affect the performance of the system. The HSIC Bottleneck: Deep Learning without Back-Propagation 2019-08-05T12:23:24Z [1908.01580] The HSIC Bottleneck: Deep Learning without Back-Propagation 2019-12-05T09:24:24Z Wan-Duo Kurt Ma J. P. Lewis We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance. 2019-08-15T17:13:21Z W. Bastiaan Kleijn 1908.01580 Wan-Duo Kurt Ma 2019-08-15 > we show that it is possible to learn classification tasks at near competitive accuracy **without backpropagation**, by **maximizing a surrogate of the mutual information between hidden representations and labels** and simultaneously **minimizing the mutual dependency between hidden representations and the inputs**... the hidden units of a network trained in this way form useful representations. Specifically, fully competitive accuracy can be obtained by freezing the network trained without backpropagation and appending and training a one-layer network using conventional SGD to convert convert the representation to the desired format. The training method uses an approximation of the [#information bottleneck](/tag/information_bottleneck_method). Advantages: > - The method facilitates parallel processing and requires significantly less operations. > - It does not suffer from exploding or vanishing gradients. > - It is biologically more plausible than Backpropagation - Reducing Bias - NLP Applications Galore - Pretrain then Finetune: A New Paradigm for NLP - Infusing Knowledge into NLP Architectures - Interpretability of Models - Rethinking Evaluation and Assumptions of Natural Language Generation - Going Beyond the Pretrain-Finetune Paradigm Trends in Natural Language Processing: ACL 2019 In Review - Mihail Eric 2019-08-05 2019-08-05T15:49:34Z Knowledge Graphs and Natural Language Processing. The Year of the Graph Newsletter, July/August 2019 | Linked Data Orchestration 2019-08-29T14:11:34Z 2019-08-29 2019-08-07T08:33:32Z 2019-08-07 Make Delegation Work in Python · fast.ai 2019-08-05 2019-08-05T09:31:44Z rakuten-nlp/category2vec (2015) 2019-08-23T00:32:38Z 2019-08-23 Watch Your Step: Learning Node Embeddings via Graph Attention 2019-08-31T11:44:34Z 2019-08-31 Soja brésilien : la méprise d’Elisabeth Borne sur les OGM qui « ne sont pas autorisés en Europe » Bloom filter 2019-08-02 2019-08-02T18:02:22Z To test whether an element is a member of a set. False positive are possible, but false negatives are not (a query returns either "possibly in set" or "definitely not in set") 2019-08-07 2019-08-07T22:58:34Z AI system 'should be recognised as inventor' - BBC News Exploring DNA with Deep Learning 2019-08-17 2019-08-17T00:16:46Z 2019-08-12 2019-08-12T11:24:06Z Could an AI duet be the next chart-topper? | Financial Times Musician Holly Herndon taught an electronic ‘collaborator’ to sing using the call-and-response hymns of her childhood. [Arte.tv "tracks" on youtube](https://www.youtube.com/watch?v=8oQdJqVOky4) 2019-08-07T01:34:08Z Active Learning | Synthesis Lectures on Artificial Intelligence and Machine Learning (2012) 2019-08-07 Xiang Wang Xiang Wang 2019-08-23T00:33:53Z [1905.07854] KGAT: Knowledge Graph Attention Network for Recommendation 2019-08-23 To provide more accurate, diverse, and explainable recommendation, it is compulsory to go beyond modeling user-item interactions and take side information into account. Traditional methods like factorization machine (FM) cast it as a supervised learning problem, which assumes each interaction as an independent instance with side information encoded. Due to the overlook of the relations among instances or items (e.g., the director of a movie is also an actor of another movie), these methods are insufficient to distill the collaborative signal from the collective behaviors of users. In this work, we investigate the utility of knowledge graph (KG), which breaks down the independent interaction assumption by linking items with their attributes. We argue that in such a hybrid structure of KG and user-item graph, high-order relations --- which connect two items with one or multiple linked attributes --- are an essential factor for successful recommendation. We propose a new method named Knowledge Graph Attention Network (KGAT) which explicitly models the high-order connectivities in KG in an end-to-end fashion. It recursively propagates the embeddings from a node's neighbors (which can be users, items, or attributes) to refine the node's embedding, and employs an attention mechanism to discriminate the importance of the neighbors. Our KGAT is conceptually advantageous to existing KG-based recommendation methods, which either exploit high-order relations by extracting paths or implicitly modeling them with regularization. Empirical results on three public benchmarks show that KGAT significantly outperforms state-of-the-art methods like Neural FM and RippleNet. Further studies verify the efficacy of embedding propagation for high-order relation modeling and the interpretability benefits brought by the attention mechanism. 1905.07854 2019-05-20T03:08:11Z Yixin Cao Tat-Seng Chua Meng Liu Xiangnan He KGAT: Knowledge Graph Attention Network for Recommendation 2019-06-08T02:49:37Z Danielle Akini, la Camerounaise qui parle aux ordinateurs 2019-08-01 2019-08-01T22:28:42Z > Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive **semantically meaningful sentence embeddings** that can be compared using cosine-similarity. Important because - BERT ist unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. - simple methods such as using the CLS token give low quality sentence embeddings However, the purpose of SBERT sentence embeddings are **not to be used for transfer learning for other tasks**. [Related blog post](/doc/2020/01/richer_sentence_embeddings_usin); [Github](https://github.com/UKPLab/sentence-transformers) [1908.10084] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks 1908.10084 Iryna Gurevych Nils Reimers 2019-08-28T22:41:55Z 2019-08-27T08:50:17Z 2019-08-27T08:50:17Z 2019-08-28 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks Nils Reimers BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods. Naftali Tishby Naftali Tishby 2015-03-09T09:39:41Z 2019-08-15 Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms. Deep Learning and the Information Bottleneck Principle 2015-03-09T09:39:41Z Noga Zaslavsky [1503.02406] Deep Learning and the Information Bottleneck Principle > Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN. 2019-08-15T17:07:31Z 1503.02406 2019-08-13T13:42:28Z 2019-08-13 ACL 2019: Highlights and Trends - Maria Khvalchik - Medium Rick Wierenga's blog posts about fast.ai 2019-08-12T19:20:26Z 2019-08-12 Reasoning With Neural Tensor Networks for Knowledge Base Completion (2013) 2019-08-03 Learning Structured Embeddings of Knowledge Bases (2011) 2019-08-03T21:55:22Z 2019-08-03 **Predicting the likely truth of additional facts based on existing facts in the knowledge base.** > we introduce an expressive neural tensor network suitable for reasoning over relationships between two entities. Most similar work: [Bordes et al.](http://127.0.0.1:8080/semanlink/doc/2019/08/learning_structured_embeddings_) (2011) Contributions: 1. new neural tensor network (**NTN**) suitable for reasoning over relationships between two entities. Generalizes several previous neural network models and provides a more powerful way to model relational information than a standard neural network layer. 2. a new way to represent entities in knowledge bases, as the average of their constituting word vectorss, allowing the sharing of statistical strength between the words describing each entity (e.g., Bank of China and China). 3. incorporation of word vectors which are trained on large unlabeled text > We **learn to modify word representations via grounding in world knowledge**. This essentially allows us to analyze word embeddings and query them for specific relations. Furthermore, the resulting vectors could be used in other tasks such as named entity recognition or relation classification in natural language **Makes use of entity name**: NTN first learns word vectors from an auxiliary news corpus, and then initializes the representation of each entity by averaging the vectors of words contained in its name. For example, **the embedding of AlfredHitchcock is initialized by the average word vectors of “alfred” and “hitchcock”**... This kind of methods model textual information separately from KG facts, and hence fail to leverage interactions between them.[src](doc:2019/05/knowledge_graph_embedding_a_su) 2019-08-03T20:45:54Z 2019-08-27T08:39:41Z 2019-08-27 Open-sourcing hyperparameter autotuning for fastText 2019-08-09T01:42:22Z 2019-08-09 Deep InfoMax: Learning good representations through mutual information maximization - Microsoft Research The structure of the graph model makes natural language processing easier 2019-08-30T21:05:05Z 2019-08-30 Accelerating Towards Natural Language Search with Graphs Sebastian Ruder sur Twitter : "In the second part of the NLP and speech processing session @DeepIndaba, @alienelf presents her journey and work on machine translation for African languages with @LauraMartinus #DLIndaba2019" 2019-08-29T23:07:37Z 2019-08-29 Network embedding methods aim at learning low-dimensional latent representation of nodes in a network. These representations can be used as features for a wide range of tasks on graphs such as classification, clustering, link prediction, and visualization. In this survey, we give an overview of network embeddings by summarizing and categorizing recent advancements in this research field. We first discuss the desirable properties of network embeddings and briefly introduce the history of network embedding algorithms. Then, we discuss network embedding methods under different scenarios, such as supervised versus unsupervised learning, learning embeddings for homogeneous networks versus for heterogeneous networks, etc. We further demonstrate the applications of network embeddings, and conclude the survey with future work in this area. 2018-08-08T00:54:01Z Steven Skiena 2019-08-25T02:02:16Z 1808.02590 A Tutorial on Network Embeddings Haochen Chen Haochen Chen Bryan Perozzi Rami Al-Rfou 2018-08-08T00:54:01Z [1808.02590] A Tutorial on Network Embeddings 2019-08-25 > "Humans exhibit a strong ability to acquire and recognize new patterns." > we learn image representations via a supervised metric-based approach with siamese neural networks, **then reuse that network’s features for one-shot learning without any retraining**. 2019-08-06 2019-08-06T18:36:48Z Siamese Neural Networks for One-shot Image Recognition (2015)