]> Seunghak Yu Constructing a Knowledge Graph from Unstructured Documents without External Alignment James Glass [2008.08995] Constructing a Knowledge Graph from Unstructured Documents without External Alignment Knowledge graphs (KGs) are relevant to many NLP tasks, but building a reliable domain-specific KG is time-consuming and expensive. A number of methods for constructing KGs with minimized human intervention have been proposed, but still require a process to align into the human-annotated knowledge base. To overcome this issue, we propose a novel method to automatically construct a KG from unstructured documents that does not require external alignment and explore its use to extract desired information. To summarize our approach, we first extract knowledge tuples in their surface form from unstructured documents, encode them using a pre-trained language model, and link the surface-entities via the encoding to form the graph structure. We perform experiments with benchmark datasets such as WikiMovies and MetaQA. The experimental results show that our method can successfully create and search a KG with 18K documents and achieve 69.7% hits@10 (close to an oracle model) on a query retrieval task. Tianxing He 2020-08-20T14:30:33Z Building a virtual KG from unstructured documents > we first extract knowledge tuples in their surface form from unstructured documents, encode them using a pre-trained language model, and link the surface-entities via the encoding to form the graph structure. 2020-08-21T18:38:32Z 2020-08-21 Seunghak Yu 2008.08995 2020-08-20T14:30:33Z srdjan ostojic sur Twitter : "During my physics undergrad, I have never heard of Singular Value Decomposition (SVD). Why?..." 2020-08-21T17:55:38Z 2020-08-21 > This book is my attempt to provide a brief but comprehensive introduction to graph representation learning, including methods for embedding graph data, graph neural networks, and deep generative models of graphs. 2020-08-22T15:15:17Z 2020-08-22 Graph Representation Learning Book - Will Hamilton 2020-08-01T11:57:38Z > Historically, companies that were bad at something would lose to companies that were good at it. But in the new Gilded Age, where we no longer enforce antitrust laws, companies that are bad at things can buy up companies that are good at them, a monopolistic tactic. 2020-08-01 Cory Doctorow #BLM sur Twitter : "Late last June, Google bought out "North,"... 2020-08-15T12:10:51Z 2020-08-15 A survey of hierarchical classification across different application domains (2011) > Only a few hundred languages are represented on the web and speakers of minority languages are severely limited in the information available to them. 2020-08-01T18:50:35Z 2020-08-01 Why You Should Do NLP Beyond English 2020-08-27T13:44:56Z Facebook apologizes to users, businesses for Apple’s monstrous efforts to protect its customers' privacy • The Register 2020-08-27 > Facebook has apologized to its users and advertisers for being forced to respect people’s privacy in an upcoming update to Apple’s mobile operating system – and promised it will do its best to invade their privacy on other platforms. Les stèles perdues d’Éthiopie | CNRS Le journal 2020-08-10 2020-08-10T11:29:28Z What do you learn from context? Probing for sentence structure in contextualized word representations Dipanjan Das 2020-08-02 2020-08-02T11:25:38Z Samuel R. Bowman 2019-05-15T17:48:56Z [1905.06316] What do you learn from context? Probing for sentence structure in contextualized word representations Patrick Xia Ian Tenney R Thomas McCoy Alex Wang Berlin Chen Ian Tenney Adam Poliak 1905.06316 Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline. Benjamin Van Durme 2019-05-15T17:48:56Z Ellie Pavlick > We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline. Najoung Kim Le centenaire de la reconnaissance internationale du Kurdistan – Un si Proche Orient 2020-08-09 2020-08-09T12:09:13Z 2020-08-12 2020-08-12T01:10:51Z benchmark datasets, metrics, results and code that can be used for evaluating the performance of extreme multi-label algorithms. [Related blog post](doc:2020/08/everything_you_always_wanted_to) Everything you always wanted to know about extreme classification (but were afraid to ask) - Microsoft Research - 2019 2020-08-12 2020-08-12T01:08:38Z The Extreme Classification Repository A Study of multilabel text classification and the effect of label hierarchy (2015) 2020-08-15 2020-08-15T14:43:20Z They have made an implementaion of a Multi-Label Classification algorithm on Tree- and DAG-Structured Label Hierarchies ([GitHub](https://github.com/sushobhannayak/cssag)) Niger : le "Grand fleuve" du Sahel - France Culture - Ép. 3/5 - Chansons d'eau douce 2020-08-01T21:55:42Z 2020-08-01 Le philosophe Bernard Stiegler est mort à l’âge de 68 ans 2020-08-07 > « Une voix singulière et forte, un penseur de la technique et du contemporain hors du commun, qui a cherché à inventer une nouvelle langue et de nouvelles subversions » 2020-08-07T13:19:22Z 2020-08-31 2020-08-31T09:52:39Z Hierarchical Multi-Label Classification Networks (2018) > architecture for HMC called HMCN, capable of simultaneously optimizing local and global loss functions for **discovering local hierarchical class-relationships and global information from the entire class hierarchy** while penalizing hierarchical violation 2020-08-30 Triple Classification Using Regions and Fine-Grained Entity Typing (AAAI 2019) 2020-08-30T19:05:59Z Multi-label classification aims to classify instances with discrete non-exclusive labels. Most approaches on multi-label classification focus on effective adaptation or transformation of existing binary and multi-class learning approaches but fail in modelling the joint probability of labels or do not preserve generalization abilities for unseen label combinations. To address these issues we propose a new multi-label classification scheme, LNEMLC - Label Network Embedding for Multi-Label Classification, that embeds the label network and uses it to extend input space in learning and inference of any base multi-label classifier. The approach allows capturing of labels' joint probability at low computational complexity providing results comparable to the best methods reported in the literature. We demonstrate how the method reveals statistically significant improvements over the simple kNN baseline classifier. We also provide hints for selecting the robust configuration that works satisfactorily across data domains. > low-complexity approach to multi-label classification built on top of two intuitions that **embedding a label space** may improve classification quality and that **label networks are a viable source of information** in multi-label problems Tomasz Kajdanowicz 1812.02956 [1812.02956] LNEMLC: Label Network Embeddings for Multi-Label Classification Nitesh Chawla 2019-01-01T21:11:09Z LNEMLC: Label Network Embeddings for Multi-Label Classification 2020-08-12 2018-12-07T09:30:18Z 2020-08-12T17:07:25Z Piotr Szymański Piotr Szymański 1607.00653 Jure Leskovec Aditya Grover Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks. node2vec: Scalable Feature Learning for Networks 2020-08-08 Aditya Grover [1607.00653] node2vec: Scalable Feature Learning for Networks 2016-07-03T16:09:30Z > algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. 2016-07-03T16:09:30Z 2020-08-08T15:57:03Z > Abdou Kadri, qui présidait l’Association des guides de girafes de Kouré et accompagnait les humanitaires d’Acted, est l’autre Nigérien à avoir péri dimanche. Que la terre lui soit légère 2020-08-11 2020-08-11T14:19:26Z « Les fous d’Allah nous les ont arrachés » : le Niger sous le choc après la mort des humanitaires A novel multi-label classification algorithm based on K-nearest neighbor and random walk - Zhen-Wu Wang, Si-Kai Wang, Ben-Ting Wan, William Wei Song, 2020 2020-08-11 2020-08-11T00:54:22Z > To fully understand the processes occurring in present-day living cells, we need to consider how they arose in evolution... The RNA World and the Origins of Life - Molecular Biology of the Cell 2020-08-28T13:33:19Z 2020-08-28 2020-08-11T22:13:55Z 2020-08-11 SLEEC: Sparse Local Embeddings for Extreme Multi-label Classification (2015) Embedding style **algorithm that performs a clustering of the training points and applies learning procedure in each of the cluster separately**. > The main technical contribution in SLEEC is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows SLEEC to break free of the traditional low-rank assumption and boost classification accuracy by **learning embeddings which preserve pairwise distances between only the nearest label vectors**. > "Tail labels" : > The critical assumption made by embedding methods, that the training label matrix is low-rank, is violated in almost all real world applications. [Python implementation](https://github.com/xiaohan2012/sleec_python) L’ADN d’un peuple inconnu millénaire découvert au Cameroun - Geo.fr 2020-08-21T17:04:22Z 2020-08-21 [paper](https://www.jmlr.org/papers/volume12/tsoumakas11a/tsoumakas11a.pdf) 2020-08-30 Mulan: A Java library for multi-label learning 2020-08-30T12:17:00Z Hugging Face sur Twitter : "No labeled data? No problem. The 🤗 Transformers master branch now includes a built-in pipeline for zero-shot text classification... 2020-08-12T17:02:34Z 2020-08-12 2020-08-15T14:50:39Z 2020-08-15 Hierarchical Multi-label Classification of Text with Capsule Networks (2019) > Our results confirm the hypothesis that capsule networks are especially advantageous for rare events and structurally diverse categories, which we attribute to their ability to combine latent encoded information. > > For each category in the hierarchy, an associated capsule outputs latent information of the category in form of a vector as opposed to a single scalar value used in traditional neural networks > la betterave cultivée selon le cahier des charges de l’agriculture biologique semble, elle, peu ou pas touchée par la jaunisse Le gouvernement va réintroduire les insecticides « tueurs d’abeilles » 2020-08-06T22:31:58Z 2020-08-06 Bringing traditional ML to your Neo4j Graph with node2vec | Dave Voutila 2020-08-06T17:45:37Z 2020-08-06 New in Neo4j Graph Data Science library (v1.3): [Graph Embeddings](tag:graph_embeddings). Google AI Blog: REALM: Integrating Retrieval into Language Representation Models > a new open-source method for language model pre-training that uses a supplemental knowledge retriever that enables it to perform well on knowledge-intensive tasks without billions of parameters. > > The key intuition of REALM is that a retrieval system should improve the model's ability to fill in missing words [Paper: REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a) 2020-08-13 2020-08-13T10:09:38Z Amit Chaudhary sur Twitter : "How to learn transformers:..." 2020-08-23 2020-08-23T23:44:40Z 2020-03-22T17:12:43Z 2020-03-22T17:12:43Z In Multi-Label Text Classification (MLTC), one sample can belong to more than one class. It is observed that most MLTC tasks, there are dependencies or correlations among labels. Existing methods tend to ignore the relationship among labels. In this paper, a graph attention network-based model is proposed to capture the attentive dependency structure among the labels. The graph attention network uses a feature matrix and a correlation matrix to capture and explore the crucial dependencies between the labels and generate classifiers for the task. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network (BiLSTM) to enable end-to-end training. Attention allows the system to assign different weights to neighbor nodes per label, thus allowing it to learn the dependencies among labels implicitly. The results of the proposed model are validated on five real-world MLTC datasets. The proposed model achieves similar or better performance compared to the previous state-of-the-art models. Muru Selvakumar Ankit Pal > **Existing methods tend to ignore the relationship among labels**. This model employs [Graph Attention Networks](tag:graph_attention_networks) (GAT) to find the correlation between labels. The generated classifiers are applied to sentence feature vectors obtained from the text feature extraction network (BiLSTM) to enable end-to-end training. > GAT network takes the node features and adjacency matrix that represents the graph data as inputs. The adjacency matrix is constructed based on the samples. **In our case, we do not have a graph dataset. Instead, we learn the adjacency matrix**, hoping that the model will determine the graph, thereby learning the correlation of the labels. > Our intuition is that by modeling the correlation among labels as a weighted graph, we force the GAT network to learn such that the adjacency matrix and the attention weights together represent the correlation. // TODO compare with [this](doc:2019/06/_1905_10070_label_aware_docume) [2003.11644] MAGNET: Multi-Label Text Classification using Attention-based Graph Neural Network Multi-Label Text Classification using Attention-based Graph Neural Network 2020-08-14T16:11:43Z Malaikannan Sankarasubbu 2003.11644 2020-08-14 Ankit Pal Designing and Interpreting Probes · John Hewitt > Probing turns supervised tasks into tools for interpreting representations. But the use of supervision leads to the question, did I interpret the representation? Or did my probe just learn the task itself? 2020-08-02T11:19:01Z 2020-08-02 scikit-multilearn/scikit-multilearn: A scikit-learn based module for multi-label et. al. classification 2020-08-12 2020-08-12T12:47:05Z 2020-08-30 2020-08-30T16:52:04Z Marcel Fröhlich sur Twitter : "Biology / information theory /physics question: How do encodings emerge?" > Coca-colonisation 2020-08-21T19:47:17Z 2020-08-21 Malbouffe et Covid-19, le cocktail mortel mexicain