Semanlink - [1909.01380] The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

> we propose to
leverage the factual knowledge from KGs to enhance LLMs,
while still benefiting from circumventing the burdensome
training expenses by using pre-trained LLMs

> Graph Neural Prompting
(GNP), a plug-and-play method to assist pre-trained
LLMs in learning beneficial knowledge from KGs
>
> GNP
encodes the pertinent grounded knowledge and complex
structural information to derive Graph Neural Prompt, an
embedding vector that can be sent into LLMs to provide
guidance and instructions

> - GNP first utilizes
a GNN to capture and encode the
intricate graph knowledge into **entity/node embeddings**. 
> - Then,
a cross-modality pooling module is present to determine
the **most relevant node embeddings in relation to the text
input**, and consolidate these node embeddings into **a holistic
graph-level embedding**.
> - After that, GNP encompasses a
**domain projector** to bridge the inherent disparities between
the graph and text domains.
> - Finally, a **self-supervised link
prediction objective** is introduced to enhance the model
comprehension of relationships between entities and capture
graph knowledge in a self-supervised manner.

2023-09-28 About

[2309.12307] LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

Tags:

2023-09-26 About

[2308.13418] Nougat: Neural Optical Understanding for Academic Documents

Tags:

2023-09-17 About

[2306.04640] ModuleFormer: Modularity Emerges from Mixture-of-Experts

Tags:

2023-09-16 About

[2309.06131] Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Tags:

2023-09-14 About

[1907.10529] SpanBERT: Improving Pre-training by Representing and Predicting Spans

Tags:

2023-08-29 About

[2002.06275] TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval

Tags:

2023-08-27 About

[2302.06600] Task-Specific Skill Localization in Fine-tuned Language Models

Tags:

2023-08-25 About

[2307.13269] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

Tags:

2023-08-08 About

[2308.00081] Towards Semantically Enriched Embeddings for Knowledge Graph Completion

Tags:

2023-08-02 About

[2109.10086] SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval

Tags:

2023-07-26 About

[2307.08621] Retentive Network: A Successor to Transformer for Large Language Models

Tags:

2023-07-20 About

[2305.14128] Dr.ICL: Demonstration-Retrieved In-context Learning

Tags:

2023-07-14 About

[2307.02486] LongNet: Scaling Transformers to 1,000,000,000 Tokens

Tags:

2023-07-06 About

[2305.07185] MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Tags:

2023-07-01 About

[2212.14024] Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

Tags:

2023-06-23 About

[2306.08302] Unifying Large Language Models and Knowledge Graphs: A Roadmap

Tags:

2023-06-18 About

[2305.12517] Retrieving Texts based on Abstract Descriptions

Tags:

2023-06-15 About

[2306.07536] TART: A plug-and-play Transformer module for task-agnostic reasoning

Tags:

2023-06-15 About

[2306.07174] Augmenting Language Models with Long-Term Memory

Tags:

2023-06-13 About

[2305.14788] Adapting Language Models to Compress Contexts

Tags:

2023-06-04 About

[2305.15294] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

Tags:

2023-05-26 About

[2305.11778] Cross-Lingual Supervision improves Large Language Models Pre-training

Tags:

2023-05-22 About

[2107.05720] SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Tags:

2023-05-18 About

[2103.15348] LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

Tags:

2023-05-18 About

[2305.06897] AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Tags:

2023-05-15 About

[2303.16839] MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Tags:

2023-04-25 About

[2202.08904] SGPT: GPT Sentence Embeddings for Semantic Search

Tags:

2023-04-25 About

[2304.09848] Evaluating Verifiability in Generative Search Engines

Tags:

2023-04-23 About

[2304.02711] Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning

Tags:

2023-04-07 About

[2211.01267] Multi-Vector Retrieval as Sparse Alignment

Tags:

2023-04-07 About

[2009.13013] SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval

Tags:

2023-04-06 About

[2304.01982] Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Tags:

2023-04-05 About

[2303.17651] Self-Refine: Iterative Refinement with Self-Feedback

Tags:

2023-04-03 About

[2303.14177] Scaling Expert Language Models with Unsupervised Domain Discovery

Tags:

2023-03-27 About

[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models

Tags:

2023-03-21 About

[2104.07186] COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

Tags:

2023-03-08 About

[2112.05682] Self-attention Does Not Need O(n^2) Memory

Tags:

2023-02-27 About

[2302.11529] Modular Deep Learning

Tags:

2023-02-23 About

[2302.10724] ChatGPT: Jack of all trades, master of none

Tags:

2023-02-22 About

[2108.08877] Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

Tags:

2023-02-17 About

[2212.09741] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Tags:

2023-02-17 About

[2302.08091] Do We Still Need Clinical Language Models?

Tags:

2023-02-17 About

[2302.05019] A Comprehensive Survey on Automatic Knowledge Graph Construction

Tags:

2023-02-15 About

[2111.15664] OCR-free Document Understanding Transformer

Tags:

2023-02-13 About

[2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools

Tags:

> Toolformer, **a model
trained to decide which APIs to call, when to
call them, what arguments to pass, and how to
best incorporate the results into future token
prediction**.

> fulfills the
following desiderata:
> - The use of tools should be learned in a
self-supervised way without requiring large
amounts of human annotations
>- The LM should be able to decide for itself when
and how to use which tool.

> Approach based
on the recent idea of using large LMs with incontext
learning (Brown et al., 2020) to generate
entire datasets from scratch.
>
> Given just a handful of human-written examples
of how an API can be used, 
> - we let a LM annotate
a huge language modeling dataset with potential
API calls. 
> - We then use a self-supervised loss to
determine which of these API calls actually help
the model in predicting future tokens. 
>- Finally, we
finetune the LM itself on the API calls that it considers
useful.

[Jay Hack @mathemagic1an sur twitter](https://twitter.com/mathemagic1an/status/1624870248221663232):

> from a small seed set of human inputs (essentially demonstrating usage of APIs), the training set for this behavior is generated by the LLM itself.
> 
> So what does this mean? We've found a promising way to tightly integrate arbitrary APIs with our best-performing models.

2023-02-13 About

[2302.04907] Binarized Neural Machine Translation

Tags:

2023-02-13 About

[2302.04870] Offsite-Tuning: Transfer Learning without Full Model

Tags:

2023-02-11 About

[2302.01398] The unreasonable effectiveness of few-shot learning for machine translation

Tags:

2023-02-07 About

[2203.14465] STaR: Bootstrapping Reasoning With Reasoning

Tags:

2023-02-07 About

[2301.07014] Dataset Distillation: A Comprehensive Review

Tags:

2023-01-23 About

[2301.08210] Everything is Connected: Graph Neural Networks

Tags:

2023-01-21 About

[2206.02743] A Neural Corpus Indexer for Document Retrieval

Tags:

2023-01-18 About

[2301.04709] Causal Abstraction for Faithful Model Interpretation

Tags:

2023-01-14 About

[1904.02817] Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Tags:

2023-01-12 About

[2002.01808] K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Tags:

2023-01-12 About

[2212.10380] What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

Tags:

2022-12-21 About

[2205.12410] AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

Tags:

2022-12-16 About

[2205.05638] Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Tags:

2022-12-15 About

[2210.16773] An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

Tags:

2022-12-08 About

[2212.02623] Unifying Vision, Text, and Layout for Universal Document Processing

Tags:

2022-12-07 About

[2211.09110] Holistic Evaluation of Language Models

Tags:

2022-12-06 About

[2212.01340] Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Tags:

2022-12-06 About

[1810.02840] Training Complex Models with Multi-Task Weak Supervision

Tags:

2022-12-05 About

[1605.07723] Data Programming: Creating Large Training Sets, Quickly

Tags:

2022-12-04 About

[2210.16637] Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations

Tags:

2022-11-25 About

[2211.03318] Fixing Model Bugs with Natural Language Patches

Tags:

2022-11-20 About

[2210.13952] KnowGL: Knowledge Generation and Linking from Text

Tags:

2022-11-13 About

[2104.11882] Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System

Tags:

2022-10-25 About

[2202.06991] Transformer Memory as a Differentiable Search Index

Tags:

2022-10-25 About

[2210.09338] Deep Bidirectional Language-Knowledge Graph Pretraining

Tags:

2022-10-23 About

[2210.07316] MTEB: Massive Text Embedding Benchmark

Tags:

2022-10-17 About

[2104.08821] SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tags:

2022-10-17 About

[1912.13318] LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Tags:

2022-10-04 About

[2205.11498] Domain Adaptation for Memory-Efficient Dense Retrieval

Tags:

2022-09-26 About

[2209.11055] Efficient Few-Shot Learning Without Prompts

Tags:

2022-09-23 About

[2008.09093] PARADE: Passage Representation Aggregation for Document Reranking

Tags:

2022-09-21 About

[2208.01066] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Tags:

2022-09-17 About

[2104.09224] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Tags:

2022-09-16 About

[2201.04337] PromptBERT: Improving BERT Sentence Embeddings with Prompts

Tags:

2022-09-16 About

[2207.05221] Language Models (Mostly) Know What They Know

Tags:

2022-09-15 About

[2203.09435] Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation

Tags:

2022-09-08 About

[2011.06225] A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

Tags:

2022-09-08 About

[2010.00711] A Survey of the State of Explainable AI for Natural Language Processing

Tags:

2022-09-08 About

[2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners

Tags:

2022-09-07 About

[2008.07267] A Survey of Active Learning for Text Classification using Deep Neural Networks

Tags:

2022-09-06 About

[2009.00236] A Survey of Deep Active Learning

Tags:

2022-09-06 About

[2209.00099] Efficient Methods for Natural Language Processing: A Survey

Tags:

2022-09-04 About

[2010.07835] Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

Tags:

2022-09-02 About

[2106.10199] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Tags:

2022-09-01 About

[1904.04458] Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Tags:

Knowledge Augmented
Language Model (KALM)

a language
model with access to information available in a
KB, no assumptions
about the availability of additional components
(such as Named Entity Taggers) or annotations

> While classes of
named entities (e.g., person or location) occur frequently,
each individual name (e.g, Atherton or
Zhouzhuang) may be observed infrequently even
in a very large corpus of text. As a result language
models learn to represent accurately only the most
popular named entities

> knowing that Alice is a name
used to refer to a person should give ample information
about the context in which the word may
occur (e.g., Bob visited Alice).

> ---

> extends a traditional **RNN LM**

> we enhance a traditional LM with a
gating mechanism that controls whether a particular
word is modeled as a general word or as a reference
to an entity
>
> We train the model end-to-end
with only the traditional predictive language modeling
perplexity objective
>
> KALM is trained end-to-end using
a predictive objective on large corpus of text.

> To the best of our knowledge, KALM is the first
unsupervised neural NER approach.

> KALM extends a traditional, RNN-based neural
LM.

2022-08-31 About

[2006.10713] Zero-Shot Learning with Common Sense Knowledge Graphs

Tags:

2022-08-29 About

[2112.07708] Learning to Retrieve Passages without Supervision

Tags:

2022-08-28 About

[2208.05388] ATLAS: Universal Function Approximator for Memory Retention

Tags:

2022-08-28 About

[2208.11857] Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey

Tags:

2022-08-27 About

[2208.11663] PEER: A Collaborative Language Model

Tags:

2022-08-26 About

[2208.09982] GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization

Tags:

2022-08-24 About

[1805.09906] Diffusion Maps for Textual Network Embedding

Tags:

2022-08-19 About

[2102.12627] How to represent part-whole hierarchies in a neural network

Tags:

2022-08-16 About

[2012.15156] A Memory Efficient Baseline for Open Domain Question Answering

Tags:

2022-08-08 About

[2208.03299] Few-shot Learning with Retrieval Augmented Language Model

Tags:

2022-08-08 About

[2208.01815] Effidit: Your AI Writing Assistant

Tags:

2022-08-06 About

[2208.00635] DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Tags:

2022-08-02 About

[2207.09980] ReFactorGNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective

Tags:

2022-07-23 About

[2201.12431] Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval

Tags:

Arxiv Doc

2022-07-21 About

[1807.00745] Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Tags:

2022-07-18 About

[2207.06300] Re2G: Retrieve, Rerank, Generate

Tags:

2022-07-14 About

[2006.01969] REL: An Entity Linker Standing on the Shoulders of Giants

Tags:

2022-07-12 About

[2205.00820] Entity-aware Transformers for Entity Search

Tags:

2022-07-12 About

[1902.06006] Contextual Word Representations: A Contextual Introduction

Tags:

2022-07-08 About

[2206.06520] Memory-Based Model Editing at Scale

Tags:

2022-07-07 About

[2205.08012] CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction

Tags:

2022-07-07 About

[2206.10658] Questions Are All You Need to Train a Dense Passage Retriever

Tags:

> **approach for training dense retrieval models that does not require any labeled training data**. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples.
>
> ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents).
>
> It uses a new document-retrieval autoencoding scheme, where
> 1. an input question is used to retrieve a set of evidence documents, and
> 2. the documents are then used to compute the probability of reconstructing the original question.
>
> Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning.

[Tweet](doc:2022/07/devendra_singh_sachan_sur_twitt)

> Given an
input question, ART first retrieves a small set
of possible evidences documents. It then recon
structs
the original question by attending to these
documents
>
> The
key idea in ART is to consider the retrieved documents
as a noisy representation of the original
question and question reconstruction probability
as a way of denoising that provides soft-labels for
how likely each document is to have been the correct
result

Refers to [[IZACARD 2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering](doc:2020/12/2012_04584_distilling_knowled)

2022-07-06 About

[2008.12813] HittER: Hierarchical Transformers for Knowledge Graph Embeddings

Tags:

2022-06-30 About

[2201.00042] Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Tags:

2022-06-26 About

[2205.15952] Knowledge Graph -- Deep Learning: A Case Study in Question Answering in Aviation Safety Domain

Tags:

2022-06-11 About

[2205.08184] SKILL: Structured Knowledge Infusion for Large Language Models

Tags:

2022-05-18 About

[2205.05131] Unifying Language Learning Paradigms

Tags:

2022-05-12 About

[2204.08173] TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

Tags:

2022-05-11 About

[2012.12624] Learning Dense Representations of Phrases at Scale

Tags:

2022-05-11 About

[2205.04260] EASE: Entity-Aware Contrastive Learning of Sentence Embedding

Tags:

2022-05-11 About

[2205.03983] Building Machine Translation Systems for the Next Thousand Languages

Tags:

2022-05-10 About

[2203.08913] Memorizing Transformers

Tags:

2022-05-07 About

[2202.10054] Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution

Tags:

2022-05-01 About

[2204.11428] Personal Research Knowledge Graphs

Tags:

2022-04-30 About

[2008.09470] Top2Vec: Distributed Representations of Topics

Tags:

2022-04-28 About

[2204.08491] Active Learning Helps Pretrained Models Learn the Intended Task

Tags:

2022-04-20 About

[1909.00426] Global Entity Disambiguation with BERT

Tags:

2022-04-18 About

[2110.08151] mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models

Tags:

[Ikuya Yamada sur Twitter : "Is entity representation effective to improve multilingual language models?..."](doc:2022/04/ikuya_yamada_sur_twitter_is_)

> Recent studies have shown that multilingual pretrained language models can be effectively improved with cross-lingual alignment information from Wikipedia entities. However, **existing methods only exploit entity information in pretraining and do not explicitly use entities in downstream tasks**. In this study, we explore the **effectiveness of leveraging entity representations for downstream cross-lingual tasks**.
>
> the key insight is that incorporating entity representations into the input allows us to extract more language-agnostic features.

[Github](https://github.com/studio-ousia/luke)

> Entity representations are known to enhance
language models in mono-lingual settings
(Zhang et al., 2019: [ERNIE](tag:ernie.html); Peters et al., 2019:  [[1909.04164] Knowledge Enhanced Contextual Word Representations](doc:2020/05/1909_04164_knowledge_enhanced); Wang et al.,
2021 [[1911.06136] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation](doc:2020/11/1911_06136_kepler_a_unified_); Xiong et al., 2020; Yamada et al., 2020: [[2010.01057] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](doc:2020/11/2010_01057_luke_deep_context))
presumably by introducing real-world knowledge.
We show that using entity representations facilitates
cross-lingual transfer by providing languageindependent
features.
>
> Multilingual extension of LUKE. The model is trained with the multilingual
masked language modeling (MLM) task as well
as the masked entity prediction (MEP) task with
Wikipedia entity embeddings

> We investigate two ways of using the entity representations
in cross-lingual transfer tasks:
> 1. perform
entity linking for the input text, and append
the detected entity tokens to the input sequence.
The entity tokens are expected to provide language independent
features to the model
> 2. use the entity
[MASK] token from the MEP task as a languageindependent
feature extractor.

2022-04-17 About

[2109.06270] STraTA: Self-Training with Task Augmentation for Better Few-shot Learning

Tags:

2022-04-14 About

[2203.10581] Cluster & Tune: Boost Cold Start Performance in Text Classification

Tags:

[Leshem Choshen sur Twitter : "Labelled data is scarce, what can we do?..."](doc:2022/04/leshem_choshen_sur_twitter_l)

> **One-sentence Summary**: we suggest adding an unsupervised intermediate classification step, before finetunning and after pretraining BERT, and show it improves performance for data-constrained cases.

> for text classification cold start (when labeled
data is scarce), **add an intermediate unsupervised
classification task**, between the pretraining
and fine-tuning phases:
> perform clustering and
train the pre-trained model on predicting the
cluster labels.

> this additional
classification phase can significantly improve
performance, mainly for **topical classification**
tasks

> we use an efficient clustering technique,
that relies on simple Bag Of Words (BOW)
representations, to partition the unlabeled training
data into relatively homogeneous clusters of text
instances.
>
> Next, we treat these clusters as labeled
data for an intermediate text classification task, and
train the pre-trained model – with or without additional
MLM pretraining – with respect to this
multi-class problem, prior to the final fine-tuning
over the actual target-task labels

> The underlying
intuition is that inter-training the model
over a related text classification task would be more
beneficial compared to MLM inter-training, which
focuses on different textual entities, namely predicting
the identity of a single token.

2022-04-06 About

[2008.11228] A simple method for domain adaptation of sentence embeddings

Tags:

2022-04-01 About

[1910.06294] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models

Tags:

2022-03-31 About

[2004.05119] Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer

Tags:

2022-03-31 About

[2203.14655] Few-Shot Learning with Siamese Networks and Label Tuning

Tags:

> the problem of building text classifiers with little or no training data.
>
> In recent years, an approach based on neural textual entailment models has been found to give strong results on a diverse range of tasks.

(cf. #[NLI](tag:nli), using the input text as the premise and the text representing the label as the hypothesis)

> In this work, we show that **with proper pre-training, Siamese Networks that embed texts and labels** offer a competitive alternative.
>
> We introduce **label tuning: fine-tuning the label embeddings only**. While giving lower performance than model fine-tuning (which updates all params of the model), this approach has the architectural advantage that a single encoder can be shared by many different tasks (we only fine-tune the label embeddings)
> The drop in quality can
be compensated by using a variant of **[Knowledge distillation](tag:knowledge_distillation)**

[Github](https://tinyurl.com/label-tuning), [Tweet](doc:2022/03/thomas_muller_sur_twitter_pa)

2022-03-30 About

[2105.00828] Memorisation versus Generalisation in Pre-trained Language Models

Tags:

2022-03-30 About

[2006.00632] Neural Unsupervised Domain Adaptation in NLP---A Survey

Tags:

2022-03-30 About

[2203.13088] Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Tags:

2022-03-30 About

[2203.06169] LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Tags:

2022-03-29 About

[2101.12294] Combining pre-trained language models and structured knowledge

Tags:

2022-03-25 About

[2006.05987] Revisiting Few-sample BERT Fine-tuning

Tags:

2022-03-21 About

[2004.09813] Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

Tags:

2022-03-18 About

[2110.10778] Contrastive Document Representation Learning with Graph Attention Networks

Tags:

2022-03-10 About

[2202.14037] Understanding Contrastive Learning Requires Incorporating Inductive Biases

Tags:

2022-03-05 About

[2109.06304] Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Tags:

2022-02-25 About

[2004.11892] Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Tags:

2022-02-11 About

[2004.07180] SPECTER: Document-level Representation Learning using Citation-informed Transformers

Tags:

2022-01-29 About

[2009.02252] KILT: a Benchmark for Knowledge Intensive Language Tasks

Tags:

2022-01-23 About

[2108.13934] Robust Retrieval Augmented Generation for Zero-shot Slot Filling

Tags:

2022-01-19 About

[2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Tags:

2022-01-19 About

[2004.12832] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Tags:

2022-01-12 About

[1906.00300] Latent Retrieval for Weakly Supervised Open Domain Question Answering

Tags:

2022-01-11 About

[2007.00814] Relevance-guided Supervision for OpenQA with ColBERT

Tags:

2022-01-07 About

[1904.08375] Document Expansion by Query Prediction

Tags:

2022-01-05 About

[2112.09118] Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Tags:

2021-12-21 About

[2010.02666] Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

Tags:

2021-12-16 About

[2112.07577] GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Tags:

An unsupervised domain adaptation technique for dense retrieval models

1. synthetic queries
are generated for each passage from the target corpus (using an existing pre-trained [T5](tag:text_to_text_transfer_transformer)
encoder-decoder)
2. the generated queries are used for mining negative
passages (retrieving the most similar
paragraphs using an existing dense retrieval
model == hard negatives!)
3. the query-passage pairs are labeled by a cross-encoder and used to train the domain-adapted
dense retriever (using method described in [Hofstätter et al.,
2020](doc:2021/12/2010_02666_improving_efficien))

[Nils Reimers sur Twitter](doc:2021/12/nils_reimers_sur_twitter_do_), [GitHub](https://github.com/UKPLab/gpl),  by the author of [TSDAE](doc:2021/09/2104_06979_tsdae_using_trans)

Claims to improve "Doc2Query" [Document Expansion by Query Prediction](doc:2022/01/1904_08375_document_expansion): ([src](https://twitter.com/KexinWang2049/status/1471435779415150598))

> - GPL: Uses doc2query to construct synthetic data and does knowledge distillation (i.e. training) on that data.
> - Doc2query: Generates queries to extend the documents and use BM25 on top of them w/o training.

2021-12-15 About

[1909.06356] Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

Tags:

2021-12-08 About

[1906.04980] Unsupervised Question Answering by Cloze Translation

Tags:

2021-12-08 About

[2112.01488] ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Tags:

2021-12-05 About

[1911.02655] Towards Domain Adaptation from Limited Data for Question Answering Using Deep Neural Networks

Tags:

2021-11-19 About

[2108.13854] Contrastive Domain Adaptation for Question Answering using Limited Text Corpora

Tags:

2021-11-19 About

[1706.03610] Neural Domain Adaptation for Biomedical Question Answering

Tags:

2021-11-19 About

[2106.13474] Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

Tags:

2021-10-21 About

[1908.11860] Adapt or Get Left Behind: Domain Adaptation through BERT Language Model Finetuning for Aspect-Target Sentiment Classification

Tags:

2021-10-21 About

[2110.08207] Multitask Prompted Training Enables Zero-Shot Task Generalization

Tags:

2021-10-18 About

[1712.05972] Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Tags:

2021-10-16 About

[2010.07245] Text Classification Using Label Names Only: A Language Model Self-Training Approach

Tags:

2021-10-16 About

[2110.06176] Mention Memory: incorporating textual knowledge into Transformers through entity mention attention

Tags:

2021-10-13 About

[2104.12016] Learning Passage Impacts for Inverted Indexes

Tags:

2021-10-08 About

[2004.09095] The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Tags:

2021-10-03 About

[2109.08133] Phrase Retrieval Learns Passage Retrieval, Too

Tags:

2021-09-30 About

[2106.04647] Compacter: Efficient Low-Rank Hypercomplex Adapter Layers

Tags:

2021-09-29 About

[2010.12566] DICT-MLM: Improved Multilingual Pre-Training using Bilingual Dictionaries

Tags:

2021-09-06 About

[2104.06979] TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Tags:

> The most
successful previous approaches like InferSent (Conneau
et al., 2017), Universial Sentence Encoder
(USE) (Cer et al., 2018) and SBERT (Reimers and
Gurevych, 2019) heavily relied on labeled data to
train sentence embedding models.
>
> TSDAE can
achieve up to 93.1% of the performance of indomain
supervised approaches. Further, we
show that TSDAE is **a strong domain adaptation
and pre-training method for sentence
embeddings**, significantly outperforming other
approaches like Masked Language Model.

> During training, TSDAE
encodes corrupted sentences into fixed-sized
vectors and requires the decoder to reconstruct the
original sentences from this sentence embedding.

- <https://www.sbert.net/examples/unsupervised_learning/TSDAE/README.html>
- [github](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/TSDAE)
- [UKPLab/sentence-transformers: Sentence Embeddings with BERT & XLNet](doc:2020/07/ukplab_sentence_transformers_s)
- [twitter](https://twitter.com/KexinWang2049/status/1433361957579538432):

> **TSDAE can learn domain-specific sentence embeddings with unlabeled sentences**
>
> Most importantly, instead of STS (Semantic Textual Similarity), **we suggest evaluating unsupervised sentence embeddings on the domain-specific tasks&datasets, which is the real use case for them**. Actually, STS scores do not correlate with performance on specific tasks.

2021-09-01 About

[2010.02353] Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

Tags:

2021-08-25 About

[2107.12708] QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Tags:

2021-08-06 About

[1911.02116] Unsupervised Cross-lingual Representation Learning at Scale

Tags:

2021-07-29 About

[2102.11107] Towards Causal Representation Learning

Tags:

2021-07-15 About

[2107.00676] A Primer on Pretrained Multilingual Language Models

Tags:

2021-07-13 About

[2010.06467] Pretrained Transformers for Text Ranking: BERT and Beyond

Tags:

2021-07-09 About

[2104.08663] BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Tags:

2021-07-09 About

[2103.11811] MasakhaNER: Named Entity Recognition for African Languages

Tags:

2021-07-06 About

[2010.12309] A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Tags:

2021-07-06 About

[2006.07264] Low-resource Languages: A Review of Past Work and Future Challenges

Tags:

2021-07-06 About

[1906.05685] A Focus on Neural Machine Translation for African Languages

Tags:

2021-06-30 About

[1405.5893] Computerization of African languages-French dictionaries

Tags:

2021-06-30 About

[2106.04612] Neural Extractive Search

Tags:

how to extend a
search paradigm we call “**extractive search**” with
neural similarity techniques.

> some information needs require extracting
and aggregating sub-sentence information
(words, phrases, or entities) from multiple documents
(e.g. a list of all the risk factors for a specific
disease and their number of mentions, or a comprehensive
table of startups and CEOs).

> extractive search combines
document selection with information extraction. **The query is extended with capture slots**:
these are **search terms that act as variables, whose
values should be extracted**.
> The user
is then presented with the matched documents, each
annotated with the corresponding captured spans,
as well as aggregate information over the captured
spans

Conclusion :

> We presented a system for neural extractive search.
While we found our system to be useful for scientific
search, it also has clear limitations and areas
for improvement, both in terms of accuracy (only
72.2% of the returned results are relevant, both the
alignment and similarity models generalize well to
some relations but not to others), and in terms of
scale

[Video of demo](https://www.youtube.com/watch?v=TtqWi2GgB5A&t=1832s)

2021-06-23 About

[2001.03765] Learning Cross-Context Entity Representations from Text

Tags:

2021-06-22 About

[2101.00345] Modeling Fine-Grained Entity Types with Box Embeddings

Tags:

2021-06-22 About

[1807.04905] Ultra-Fine Entity Typing

Tags:

2021-06-22 About

[2102.07043] Reasoning Over Virtual Knowledge Bases With Open Predicate Relations

Tags:

> a method for constructing **a virtual KB (VKB) trained entirely from text**

Open Predicate Query Language (OPQL): constructing a virtual knowledge base (VKB) that supports KB reasoning & open-domain QA, tackling the incompleteness of knowledge bases by constructing a virtual KB only from text

> OPQL constructs
a VKB by **encoding and indexing a set of
relation mentions** in a way that naturally enables
reasoning and can be trained without any structured
supervision.

> can be used
as an **external memory integrated into a language
model**

cf. this earlier paper [[2002.10640] Differentiable Reasoning over a Virtual Knowledge Base](doc:2020/07/2002_10640_differentiable_rea). But does not require an initial structured KB for distant
supervision.

> The key idea in constructing the OPQL VKB is to use a
dual-encoder pre-training process, similar to 
[[1906.03158] Matching the Blanks: Distributional Similarity for Relation Learning](doc:2021/05/1906_03158_matching_the_blank)

Related work section refers to [[1909.04164] Knowledge Enhanced Contextual Word Representations](doc:2020/05/1909_04164_knowledge_enhanced). Also refers to [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_) (some authors in common)

2021-06-20 About

[2106.04098] Ultra-Fine Entity Typing with Weak Supervision from a Masked Language Model

Tags:

2021-06-16 About

[1410.5859] Towards a Model Theory for Distributed Representations

Tags:

2021-06-10 About

[2106.00882] Efficient Passage Retrieval with Hashing for Open-domain Question Answering

Tags:

2021-06-03 About

[2004.04906] Dense Passage Retrieval for Open-Domain Question Answering

Tags:

2021-06-03 About

[2104.10809] Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand?

Tags:

2021-05-23 About

[2001.11631] Enhancement of Short Text Clustering by Iterative Classification

Tags:

2021-05-20 About

[2103.12953] Supporting Clustering with Contrastive Learning

Tags:

2021-05-20 About

[2009.12030] AutoETER: Automated Entity Type Representation for Knowledge Graph Embedding

Tags:

2021-05-17 About

[1911.09419] Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Tags:

2021-05-17 About

[1906.03158] Matching the Blanks: Distributional Similarity for Relation Learning

Tags:

2021-05-13 About

[2104.14690] Entailment as Few-Shot Learner

Tags:

2021-05-03 About

[1909.10506] Learning Dense Representations for Entity Retrieval

Tags:

> We show that it is feasible to perform **entity
linking by training a dual encoder (two-tower)
model that encodes mentions and entities in
the same dense vector space**, where candidate
entities are retrieved by approximate nearest
neighbor search. Unlike prior work, **this setup
does not rely on an alias table followed by a
re-ranker, and is thus the first fully learned entity
retrieval model**.

Contributions:

> -  a dual encoder architecture for
learning entity and mention encodings suitable for
retrieval. A key feature of the architecture is that it
employs a modular **hierarchy of sub-encoders that
capture different aspects of mentions and entities**
> - a simple, fully unsupervised **hard negative
mining** strategy that produces massive gains
in retrieval performance, compared to using only
random negatives
> - high
quality candidate entities very efficiently using approximate nearest neighbor search
> - outperforms discrete retrieval
baselines like an alias table or BM25

> strong retrieval
performance across all 5.7 million Wikipedia entities in
around 3ms per mention

> since we are using a two-tower or dual
encoder architecture, **our model cannot use any kind of attention over
both mentions and entities at once**, nor feature-wise
comparisons as done by Francis-Landau et al. (2016).
This is a fairly severe constraint – for example, **we cannot
directly compare the mention span to the entity title**
– but it permits retrieval with nearest neighbor search
for the entire context against a single, all encompassing
representation for each entity

2021-05-01 About

[2011.05864] On the Sentence Embeddings from Pre-trained Language Models

Tags:

2021-04-19 About

[2007.12603] IR-BERT: Leveraging BERT for Semantic Search in Background Linking for News Articles

Tags:

2021-04-12 About

[2007.15779] Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Tags:

2021-04-11 About

[1902.00751] Parameter-Efficient Transfer Learning for NLP

Tags:

2021-04-11 About

[2012.02558] Pre-trained language models as knowledge bases for Automotive Complaint Analysis

Tags:

2021-04-11 About

[1910.02227] Making sense of sensory input

Tags:

2021-04-10 About

[2010.12321] BARThez: a Skilled Pretrained French Sequence-to-Sequence Model

Tags:

2021-03-31 About

[2103.12876] Complex Factoid Question Answering with a Free-Text Knowledge Graph

Tags:

2021-03-30 About

[1901.04085] Passage Re-ranking with BERT

Tags:

2021-03-26 About

[2010.02194] Self-training Improves Pre-training for Natural Language Understanding

Tags:

2021-03-12 About

[1911.03876] Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering

Tags:

2021-02-08 About

[2010.00904] Autoregressive Entity Retrieval

Tags:

2021-01-14 About

[1911.03681] E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

Tags:

> way of **injecting factual knowledge about entities into the pretrained BERT model**.

(Feeding entity vectors
into BERT as if they
were wordpiece vectors without additional encoder
pretraining)

>
> **We align [Wikipedia2Vec](tag:wikipedia2vec) entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors**. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to [ERNIE](tag:ernie) (Zhang et al., 2019) and [KnowBert](tag:knowbert) (Peters et al., 2019), but it **requires no expensive further pretraining of the BERT encoder**.
>
> Our vector space alignment strategy is inspired by
cross-lingual word vector alignment

Related work on Entity-enhanced BERT:

> ([ERNIE](doc:2019/08/_1905_07129_ernie_enhanced_la) and [Knowbert](doc:2020/05/1909_04164_knowledge_enhanced)) are based on the design principle
that BERT be adapted to entity vectors. They introduce
new encoder layers to feed pretrained entity
vectors into the Transformer, and they require additional
pretraining to integrate the new parameters.
In contrast, E-BERT’s design principle is that entity
vectors be adapted to BERT.
>
> Two other knowledge-enhanced MLMs are [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_)
(Wang et al., 2019c) and K-Adapter (Wang
et al., 2020)... Their factual knowledge
does not stem from entity vectors – instead, they
are trained in a multi-task setting on relation classification
and knowledge base completion.

Not to be cofounded with [[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce](doc:2020/12/2009_02835_e_bert_a_phrase_a)

2021-01-12 About

[2012.04740] River: machine learning for streaming data in Python

Tags:

2021-01-05 About

[2012.15723] Making Pre-trained Language Models Better Few-shot Learners

Tags:

2021-01-02 About

[2009.02835] E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

Tags:

2020-12-14 About

[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training

Tags:

**Augment language model pre-training with a retriever module**, which
is trained using the masked language modeling objective.

> To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. **For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner**, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents

Hum, #TODO: parallel to be drawn with techniques in [KG-augmented Language Models](tag:knowledge_graph_augmented_language_models) which focus "on the problem of capturing declarative knowledge in the learned parameters of a language model."

[Google AI Blog Post](doc:2020/08/google_ai_blog_realm_integrat)

[Summary](https://joeddav.github.io/blog/2020/03/03/REALM.html) for the [Hugging Face awesome-papers reading group](doc:2021/03/huggingface_awesome_papers_pap)

2020-12-12 About

[2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering

Tags:

> a method to train an information retrieval module for downstream tasks, **without using pairs of queries and documents as annotations**.

Uses two models (standard pipeline for open-domain QA):

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About

[2004.10964] Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Tags:

2020-12-01 About

[2011.06993] FLERT: Document-Level Features for Named Entity Recognition

Tags:

2020-12-01 About

[2010.01057] LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Tags:

2020-11-26 About

[2011.02260] Graph Neural Networks in Recommender Systems: A Survey

Tags:

2020-11-11 About

[1911.06136] KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Tags:

2020-11-03 About

[2010.03496] Inductive Entity Representations from Text via Link Prediction

Tags:

BLP "BERT for Link Prediction". Central idea: **training an entity encoder with a
link prediction objective** (using the textual descriptions of entities when computing entity representations - hence not failing with entities unknown in training)

> a method for **learning representations
of entities**, that uses a **pre-trained Transformer** based
architecture as an entity encoder, and
**link prediction training on a knowledge graph
with textual entity descriptions**.

> using entity descriptions,
an entity encoder is trained for link prediction in
a knowledge graph. The encoder can then be used
without fine-tuning to obtain features for entity classification
and information retrieval

Cites [Xie et al](doc:2020/10/representation_learning_of_know) and [Kepler](doc:2020/11/1911_06136_kepler_a_unified_). They claim that their
objective targeted exclusively for link prediction (and not an objective that combines language modeling
and link prediction as Kepler)
performs better than Kepler's more complex one.

2020-11-03 About

[2010.11967] Language Models are Open Knowledge Graphs

Tags:

2020-10-26 About

[2010.11882] Learning Invariances in Neural Networks

Tags:

2020-10-25 About

[2010.05234] A Practical Guide to Graph Neural Networks

Tags:

2020-10-15 About

[1904.09078] EmbraceNet: A robust deep learning architecture for multimodal classification

Tags:

2020-10-14 About

[1911.11506] Word-Class Embeddings for Multiclass Text Classification

Tags:

> In supervised tasks such as multiclass
text classification (the focus of this article) it seems appealing to enhance word representations
with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class
embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings,
they substantially facilitate the training of deep-learning models in multiclass classification by
topic.
>
> A differentiating aspect of our method is that it keeps the modelling of word-class interactions separate from the
original word embedding. Word-class correlations are confined in a dedicated vector space, whose vectors enhance
(by concatenation) the unsupervised representations. The net effect is an embedding matrix that is better suited to
classification, and imposes no restriction to the network architecture using it.

[github](https://github.com/AlexMoreo/word-class-embeddings). Refers to [LEAM](doc:2020/02/joint_embedding_of_words_and_la) :

> [in LEAM] Once words and labels are embedded in a common vector space, word-label
compatibility is measured via cosine similarity. Our method instead models these compatibilities directly, without
generating intermediate embeddings for words or labels.

2020-10-11 About

[2004.03705] Deep Learning Based Text Classification: A Comprehensive Review

Tags:

2020-10-11 About

[2005.03675] Machine Learning on Graphs: A Model and Comprehensive Taxonomy

Tags:

2020-10-03 About

[2010.00402] From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Tags:

2020-10-03 About

[1802.05930] Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Tags:

2020-10-02 About

[2001.08053] Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization

Tags:

2020-10-01 About

[1911.02685] A Comprehensive Survey on Transfer Learning

Tags:

2020-09-24 About

[2009.07938] Type-augmented Relation Prediction in Knowledge Graphs

Tags:

2020-09-19 About

[1806.06478] Co-training Embeddings of Knowledge Graphs and Entity Descriptions for Cross-lingual Entity Alignment

Tags:

2020-09-06 About

[1609.02521] DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification

Tags:

2020-09-06 About

[1803.07828] Expeditious Generation of Knowledge Graph Embeddings

Tags:

2020-09-02 About

[2009.00318] More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings

Tags:

2020-09-02 About

[1909.01259] Neural Attentive Bag-of-Entities Model for Text Classification

Tags:

A model that performs **text classification using entities in a knowledge base**.

> Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine **simple high-recall entity detection based on a dictionary** (word->list of entities), to detect entities in a document, with a novel neural **attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities**.

2 steps:

1. Entity detection
2. Classification using the detected entities (+text) as inputs

Regarding entity linking, a local model which uses cosine
similarity between the embedding of the target
entity and the word-based representation of
the document to capture the relevance of an entity
given a document.

Embeddings from the KB: computed using [#Wikipedia2Vec](tag:wikipedia2vec) (similar words and entities
close to one another in a unified vector space)

Model using attention, with 2 features :

- cosine similarity between the
embedding of the entity and the word based
representation of the document
- the probability that the entity
name refers to the entity in KB.

Somewhat [related](doc:2020/01/investigating_entity_knowledge_)

### Conclusion:

>a neural
network model that performs text classification using
entities in Wikipedia. We combined simple
dictionary-based entity detection with a neural attention
mechanism to enable the model to focus
on a small number of unambiguous and relevant
entities in a document.

2020-09-02 About

[1812.06280] Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia

Tags:

2020-09-02 About

[1306.6802] Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

Tags:

2020-09-01 About

[2008.08995] Constructing a Knowledge Graph from Unstructured Documents without External Alignment

Tags:

2020-08-21 About

[2003.11644] MAGNET: Multi-Label Text Classification using Attention-based Graph Neural Network

Tags:

2020-08-14 About

[1812.02956] LNEMLC: Label Network Embeddings for Multi-Label Classification

Tags:

2020-08-12 About

[1607.00653] node2vec: Scalable Feature Learning for Networks

Tags:

2020-08-08 About

[1905.06316] What do you learn from context? Probing for sentence structure in contextualized word representations

Tags:

2020-08-02 About

[1911.03903] A Re-evaluation of Knowledge Graph Completion Methods

Tags:

2020-07-28 About

[2004.07202] Entities as Experts: Sparse Memory Access with Entity Supervision

Tags:

2020-07-11 About

[2002.10640] Differentiable Reasoning over a Virtual Knowledge Base

Tags:

2020-07-11 About

[2007.04612] Concept Bottleneck Models

Tags:

2020-07-10 About

[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge

Tags:

> a neural language model that includes **an explicit interface between symbolically interpretable factual information and subsymbolic neural knowledge.**... **The model can be updated without re-training by manipulating its symbolic representations**. In particular this model allows us to add new facts and overwrite existing ones.

> a **neural language model which learns to access information
in a symbolic knowledge graph.**

> This
model builds on the recently-proposed [Entities as
Experts](doc:2020/07/2004_07202_entities_as_expert) (EaE) language model (Févry et al., 2020),
which extends the same transformer (Vaswani
et al., 2017) architecture of BERT (Devlin et al., 2019) with an additional external memory for entities.
>
> After training EaE, the embedding associated
with an entity will (ideally) capture information
about the textual context in which that
entity appears, and by inference, the entity’s semantic
properties
>
> we include an additional
memory called a fact memory, which encodes
triples from a symbolic KB.
>
> This combination results in a
neural language model which learns to access information
in a the symbolic knowledge graph.

TODO:

- read again IBM's [Span Selection Pre-training for Question Answering](doc:2019/09/_1909_04120_span_selection_pre) ("an effort to avoid encoding general knowledge in the transformer network itself")
- compare with [[1907.05242] Large Memory Layers with Product Keys](doc:2019/07/_1907_05242_large_memory_layer)
- how does it relate with [[2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)?

2020-07-09 About

Tags:

2020-07-02 About

[2006.15020] Pre-training via Paraphrasing

Tags:

2020-06-30 About

[2006.09462] Selective Question Answering under Domain Shift

Tags:

2020-06-30 About

[2001.04451] Reformer: The Efficient Transformer

Tags:

2020-06-29 About

[2002.06504] Differentiable Top-k Operator with Optimal Transport

Tags:

2020-06-29 About

[1910.00163] Specializing Word Embeddings (for Parsing) by Information Bottleneck

Tags:

2020-06-29 About

[2006.13365] Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework

Tags:

2020-06-26 About

[1903.11279] Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Tags:

2020-06-16 About

[1910.01348] On the Efficacy of Knowledge Distillation

Tags:

2020-06-06 About

[1804.03235] Large scale distributed neural network training through online distillation

Tags:

2020-06-06 About

[1511.03643] Unifying distillation and privileged information

Tags:

2020-05-31 About

[1709.03933] Hash Embeddings for Efficient Word Representations

Tags:

2020-05-19 About

[2003.08001] Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

Tags:

2020-05-15 About

[1909.04164] Knowledge Enhanced Contextual Word Representations

Tags:

2020-05-13 About

[1907.04829] BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Tags:

2020-05-12 About

[1912.08422] Distilling Structured Knowledge into Embeddings for Explainable and Accurate Recommendation

Tags:

2020-05-12 About

[1807.08447] LinkNBed: Multi-Graph Representation Learning with Entity Linkage

Tags:

2020-05-11 About

[1706.00384] Deep Mutual Learning

Tags:

2020-05-11 About

[1906.07241] Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Tags:

2020-05-11 About

[2003.08505] A Metric Learning Reality Check

Tags:

2020-05-10 About

[1910.12507] A Survey on Knowledge Graph Embeddings with Literals: Which model links better Literal-ly?

Tags:

2020-05-04 About

[2004.14843] Knowledge Graph Embeddings and Explainable AI

Tags:

2020-05-04 About

[2004.14958] A Call for More Rigor in Unsupervised Cross-lingual Learning

Tags:

2020-05-02 About

[1911.03814] Scalable Zero-shot Entity Linking with Dense Entity Retrieval

Tags:

2020-05-02 About

[2004.14545] Explainable Deep Learning: A Field Guide for the Uninitiated

Tags:

2020-05-01 About

[1906.01195] Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Tags:

2020-04-30 About

[cmp-lg/9511007] Using Information Content to Evaluate Semantic Similarity in a Taxonomy (1995)

Tags:

2020-04-27 About

[2001.09522] TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Tags:

how to add a set of new concepts to an existing taxonomy.

[Tweet](https://twitter.com/mickeyjs6/status/1253772146142216194?s=20) [GitHub](https://github.com/mickeystroller/TaxoExpan)

> we study the taxonomy expansion task: given an
existing taxonomy and a set of new emerging concepts, we aim
to automatically expand the taxonomy to incorporate these new
concepts (without changing the existing relations in the given taxonomy).

> To the best of our knowledge, this is the first study on **how to
expand an existing directed acyclic graph (as we model a taxonomy
as a DAG) using self-supervised learning**.

Self-supervised framework, the existing taxonomy being used as training data: it learns a model to predict whether a query concept is the direct hyponym of an anchor concept.

> 2 techniques:
>
> 1. a **position-enhanced graph neural network that encodes the local structure of an anchor concept** in the existing taxonomy,
> 2. a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.

Regarding 1: uses [GNN](/tag/graph_neural_networks.html) to model the "ego network" of concepts (potential “siblings”
and “grand parents” of the query concept).

> Regular
GNNs fail to distinguish nodes with different relative positions to
the query (i.e., some nodes are grand parents of the query while
the others are siblings of the query). To address this limitation, we
present a simple but effective enhancement to inject such position
information into GNNs using position embedding. We show that
such embedding can be easily integrated with existing GNN architectures
(e.g., [GCN](/tag/graph_convolutional_networks) and GAT) and significantly boosts the
prediction performance

Regarding point 2: uses InfoNCE loss, cf. [Contrastive Predictive Coding](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1807.03748)

> Instead of predicting
whether each individual ⟨query concept, anchor concept⟩ pair
is positive or not, we first group all pairs sharing the same query
concept into a single training instance and learn a model to select
the positive pair among other negative ones from the group.

(Hum, ça me rappelle quelque chose)

> assume each concept (in existing taxonomy + set of new concepts) has an initial embedding
vector learned from some text associated with this concept.

To keep things tractable, only attempts to find a single parent node of each new concept.

2020-04-25 About

[2004.10151] Experience Grounds Language

Tags:

2020-04-22 About

[2004.06842] Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph

Tags:

2020-04-17 About

[1503.02531] Distilling the Knowledge in a Neural Network

Tags:

2020-04-16 About

[1903.04197] Structured Knowledge Distillation for Dense Prediction

Tags:

2020-04-16 About

[2004.05150] Longformer: The Long-Document Transformer

Tags:

2020-04-13 About

[1904.01947] Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms

Tags:

2020-04-02 About

[1909.03193] KG-BERT: BERT for Knowledge Graph Completion

Tags:

2020-03-22 About

[1911.02168] CoKE: Contextualized Knowledge Graph Embedding

Tags:

2020-03-22 About

[2003.08271] Pre-trained Models for Natural Language Processing: A Survey

Tags:

2020-03-19 About

[2003.03384] AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Tags:

2020-03-17 About

[1905.06088] Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning

Tags:

2020-03-15 About

[2003.00330] Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

Tags:

2020-03-15 About

[1909.07606] K-BERT: Enabling Language Representation with Knowledge Graph

Tags:

2020-03-08 About

[2003.02320] Knowledge Graphs

Tags:

2020-03-07 About

[1902.10197] RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

Tags:

2020-03-03 About

[2002.12327] A Primer in BERTology: What we know about how BERT works

Tags:

2020-02-28 About

[2002.11402] Detecting Potential Topics In News Using BERT, CRF and Wikipedia

Tags:

2020-02-27 About

[1910.04126] Scalable Nearest Neighbor Search for Optimal Transport

Tags:

2020-02-20 About

[1802.01528] The Matrix Calculus You Need For Deep Learning

Tags:

2020-02-19 About

[1805.04174] Joint Embedding of Words and Labels for Text Classification (ACL Anthology 2018)

Tags:

2020-02-18 About

[1503.08677] Label-Embedding for Image Classification

Tags:

2020-02-18 About

[2002.05867] Transformers as Soft Reasoners over Language

Tags:

2020-02-17 About

[2002.04688] fastai: A Layered API for Deep Learning

Tags:

2020-02-13 About

[1911.05507] Compressive Transformers for Long-Range Sequence Modelling

Tags:

2020-02-11 About

[2002.02925] BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

Tags:

2020-02-10 About

[1703.07464] No Fuss Distance Metric Learning using Proxies

Tags:

> We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity...
> Traditionnaly, supervision is expressed in the form of sets of points that follow
an ordinal relationship – an anchor point x is similar to
a set of positive points Y , and dissimilar to a set of negative
points Z, and a loss defined over these distances is minimized.
> Triplet-Based methods are challenging to optimize (a main issue is the need for finding informative triplets).
>
> We propose to **optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well**. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss.

Mentioned in this [blog post](/doc/2020/01/training_a_speaker_embedding_fr):

> "**Proxy based triplet learning**": instead of generating triplets, we learn an embedding for each class and use the learnt embedding as a proxy for triplets as part of the training. In other words, we can train end to end without the computationally expensive step of resampling triplets after each network update.

Near the conclusion:

> Our formulation of Proxy-NCA loss produces a loss very
similar to the standard cross-entropy loss used in classification.
However, we arrive at our formulation from a different
direction: we are not interested in the actual classifier and
indeed discard the proxies once the model has been trained.
Instead, the proxies are auxiliary variables, enabling more
effective optimization of the embedding model parameters.
**As such, our formulation not only enables us to surpass the
state of the art in zero-shot learning, but also offers an explanation
to the effectiveness of the standard trick of training
a classifier, and using its penultimate layer’s output as the
embedding.**

2020-02-09 About

[1503.03832] FaceNet: A Unified Embedding for Face Recognition and Clustering

Tags:

2020-01-25 About

[2001.07685] FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

Tags:

2020-01-22 About

[1912.12510] Detecting Out-of-Distribution Examples with In-distribution Examples and Gram Matrices

Tags:

2020-01-15 About

[1711.00046] Replace or Retrieve Keywords In Documents at Scale

Tags:

2020-01-09 About

[2003.05473] Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking (CoNNL 2019)

Tags:

Training BERT-base-uncased on English Wikipedia and then fine-tuned and evaluating it
on an entity linking (EL) benchmark (EL implemented as a token classification over the entity vocabulary)

> BERT+Entity is a straightforward extension on top
of BERT, i.e. we initialize BERT with the publicly
available weights from the BERT-base-uncased
model and add an output classification layer on
top of the architecture. Given a contextualized token,
the classifier computes the probability of an
entity link for each entry in the entity vocabulary.

Can BERT’s architecture learn all entity
linking steps jointly? To answer:

> an extreme
simplification of the **entity linking setup that
works surprisingly well**: simply cast it as **a
per token classification over the entire entity
vocabulary** (over 700K classes in our case).

> the model
is the first that performs entity linking without any
pipeline or any heuristics, compared to all prior
approaches. We found that with our approach we
can learn additional entity knowledge in BERT that
helps in entity linking. **However, we also found
that almost none of the downstream tasks really
required entity knowledge**.

### Related work

- > [Durrett and Klein (2014)](/doc/2020/01/a_joint_model_for_entity_analys) were the first to propose
jointly modelling Mention detection, Candidate generation and Entity disambiguation in a graphical
model and could show that each of those steps are
interdependent and benefit from a joint objective

This paper uses neural techniques instead of CRF.

- > [Yamada](/showprop.do?pptyuri=http%3A%2F%2Fwww.semanlink.net%2F2001%2F00%2Fsemanlink-schema%23arxiv_author&pptyval=Ikuya%2BYamada) (2016, 2017) was the first to
investigate neural text representations and entity
linking, but their approach is limited to ED.

cf. [#Wikipedia2Vec](tag:wikipedia2vec). Compare with [newer work by Yamada](doc:2020/09/1909_01259_neural_attentive_b)

2020-01-09 About

[2001.01447] Improving Entity Linking by Modeling Latent Entity Type Information

Tags:

2020-01-09 About

[1902.10909] BERT for Joint Intent Classification and Slot Filling

Tags:

2020-01-09 About

[1802.07569] Continual Lifelong Learning with Neural Networks: A Review

Tags:

2020-01-01 About

[1912.08904] Macaw: An Extensible Conversational Information Seeking Platform

Tags:

2020-01-01 About

[1911.00172] Generalization through Memorization: Nearest Neighbor Language Models

Tags:

2019-12-20 About

[1707.00306] Variable Selection Methods for Model-based Clustering

Tags:

2019-12-11 About

[1912.03927] Large deviations for the perceptron model and consequences for active learning

Tags:

2019-12-11 About

[1912.03263] Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

Tags:

2019-12-09 About

[1912.01412] Deep Learning for Symbolic Mathematics

Tags:

2019-12-09 About

[1905.11852] EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction

Tags:

2019-12-05 About

[1909.02164] TabFact: A Large-scale Dataset for Table-based Fact Verification

Tags:

2019-12-01 About

[1807.00082] Amanuensis: The Programmer's Apprentice

Tags:

**The use of natural language to facilitate communication
between the expert programmer and apprentice AI system.**

> an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon **insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems** that leverage and extend the state of the art in machine learning **by integrating human and machine intelligence**. As a concrete example we focus on digital assistants that learn from continuous dialog with an expert software engineer while providing initial value as powerful analytical, computational and mathematical savants.

> [#Dehaene](/tag/stanislas_dehaene)'s work extends the [#Global Workspace Theory](/tag/global_workspace_theory) of Bernard Baars. Dehaene’s version of the theory combined with Yoshua Bengio’s concept of a [#consciousness prior](/tag/consciousness_prior.html) and deep reinforcement learning suggest a model for constructing and maintaining the cognitive states that arise and persist during complex problem solving.

2019-11-12 About

[1910.09760] Question Answering over Knowledge Graphs via Structural Query Patterns

Tags:

2019-11-06 About

[1911.01464] Emerging Cross-lingual Structure in Pretrained Language Models

Tags:

2019-11-06 About

[1011.4088] An Introduction to Conditional Random Fields

Tags:

2019-10-13 About

[1802.07044] The Description Length of Deep Learning Models

Tags:

2019-10-11 About

[1910.03524] Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs

Tags:

2019-10-09 About

[1909.04939] InceptionTime: Finding AlexNet for Time Series Classification

Tags:

2019-09-28 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1909.03186] On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Tags:

2019-09-11 About

[1909.01066] Language Models as Knowledge Bases?

Tags:

2019-09-05 About

[1908.08983] A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers

Tags:

2019-08-28 About

[1908.10084] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Tags:

2019-08-28 About

[1808.02590] A Tutorial on Network Embeddings

Tags:

2019-08-25 About

[1904.02342] Text Generation from Knowledge Graphs with Graph Transformers

Tags:

2019-08-23 About

[1905.07854] KGAT: Knowledge Graph Attention Network for Recommendation

Tags:

2019-08-23 About

[1908.01580] The HSIC Bottleneck: Deep Learning without Back-Propagation

Tags:

2019-08-15 About

[1503.02406] Deep Learning and the Information Bottleneck Principle

Tags:

2019-08-15 About

[physics/0004057] The information bottleneck method

Tags:

> We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. **Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y.** That is, we squeeze the information that X provides about Y through a ‘bottleneck’ formed by a limited set of codewords X ̃... This approach yields an exact set of self consistent equations for the coding rules X → X ̃ and X ̃ → Y .

(from the intro) : how to define "meaningful / relevant" information? An issue left out of information theory by Shannon (focus on the problem of transmitting information rather than judging its value to the recipient) ->leads to
consider statistical and information theoretic principles as almost irrelevant
for the question of meaning.

> In contrast, **we argue here that information theory,
in particular lossy source compression, provides a natural quantitative
approach to the question of “relevant information.”** Specifically, we formulate
a **variational principle** for the extraction or efficient representation of
relevant information.

2019-08-15 About

[1905.07129] ERNIE: Enhanced Language Representation with Informative Entities

Tags:

2019-08-05 About

[1907.07355] Probing Neural Network Comprehension of Natural Language Arguments

Tags:

2019-07-24 About

[1602.01137] A Dual Embedding Space Model for Document Ranking

Tags:

2019-07-17 About

[1901.00596] A Comprehensive Survey on Graph Neural Networks

Tags:

2019-07-15 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About

[1907.03950] Learning by Abstraction: The Neural State Machine

Tags:

2019-07-10 About

[1904.13001] Encoding Categorical Variables with Conjugate Bayesian Models for WeWork Lead Scoring Engine

Tags:

2019-07-04 About

[1810.10531] A mathematical theory of semantic development in deep neural networks

Tags:

2019-06-29 About

[1812.00417] Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

Tags:

2019-06-28 About

[1810.04882] Towards Understanding Linear Word Analogies

Tags:

2019-06-24 About

[1905.10070] Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification

Tags:

2019-06-22 About

[1906.04341] What Does BERT Look At? An Analysis of BERT's Attention

Tags:

2019-06-21 About

[1906.08237] XLNet: Generalized Autoregressive Pretraining for Language Understanding

Tags:

2019-06-21 About

[1812.05944] A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Experiments

Tags:

2019-06-18 About

[1906.02715] Visualizing and Measuring the Geometry of BERT

Tags:

2019-06-07 About

[1905.12149] SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver

Tags:

2019-05-31 About

[1709.07604] A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications

Tags:

2019-05-29 About

[1905.05950] BERT Rediscovers the Classical NLP Pipeline

Tags:

2019-05-18 About

[1506.02142] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Tags:

2019-05-13 About

[1810.09164] Named Entity Disambiguation using Deep Learning on Graphs

Tags:

2019-04-26 About

[1802.01021] DeepType: Multilingual Entity Linking by Neural Type System Evolution

Tags:

2019-04-25 About

[1812.09449] A Survey on Deep Learning for Named Entity Recognition

Tags:

2019-04-24 About

[1807.06036] Pangloss: Fast Entity Linking in Noisy Text Environments

Tags:

2019-04-23 About

[1808.07699] End-to-End Neural Entity Linking

Tags:

2019-04-23 About

[1904.08398] DocBERT: BERT for Document Classification

Tags:

2019-04-18 About

[1806.04411] Named Entity Recognition with Extremely Limited Data

Tags:

2019-04-11 About

[1703.02507] Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

Tags:

2019-03-25 About

[1803.02893] An efficient framework for learning sentence representations

Tags:

2019-03-20 About

[1902.09229] A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Tags:

2019-03-20 About

[1903.05823] Deep Patent Landscaping Model Using Transformer and Graph Embedding

Tags:

2019-03-18 About

[1903.05872] Interactive Concept Mining on Personal Data -- Bootstrapping Semantic Services

Tags:

2019-03-17 About

[1902.11269] Efficient Contextual Representation Learning Without Softmax Layer

Tags:

**how to accelerate contextual representation learning**.

> Contextual representation models are difficult to train due to the large parameter sizes and high computational complexity

> We find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size.
Therefore, we redesign the learning objectiv.
> Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings.
Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary.
When applied to ELMo, our method achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks.

**decouples learning contexts and words**

> Instead of using
a softmax layer to predict the distribution of the
missing word, we utilize and extend the SEMFIT
layer (Kumar and Tsvetkov, 2018) to **predict the
embedding of the missing word**.

2019-03-02 About

[1902.10618] Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition

Tags:

2019-02-28 About

[1511.06335] Unsupervised Deep Embedding for Clustering Analysis

Tags:

2019-02-19 About

[1902.05309] Transfer Learning for Sequence Labeling Using Source Model and Target Data

Tags:

2019-02-18 About

[1902.05196] Categorical Metadata Representation for Customized Text Classification

Tags:

2019-02-18 About

[1901.11504] Multi-Task Deep Neural Networks for Natural Language Understanding

Tags:

2019-02-17 About

[1901.03136] Automating the search for a patent's prior art with a full text similarity search

Tags:

2019-02-15 About

[1711.09677] Binary classification models with "Uncertain" predictions

Tags:

2019-02-02 About

[1704.08803] Neural Ranking Models with Weak Supervision

Tags:

2019-01-27 About

[1601.01343] Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Tags:

2019-01-27 About

[1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Tags:

2019-01-11 About

[1812.04616] Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

Tags:

2018-12-14 About

[1811.05370] Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents

Tags:

2018-11-20 About

[1811.06031] A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

Tags:

2018-11-17 About

[1807.07984] Attention Models in Graphs: A Survey

Tags:

2018-11-14 About

[1605.07427] Hierarchical Memory Networks

Tags:

2018-11-14 About

[1604.00289] Building Machines That Learn and Think Like People

Tags:

2018-10-28 About

[1503.08895] End-To-End Memory Networks

Tags:

2018-10-23 About

[1703.03129] Learning to Remember Rare Events

Tags:

2018-10-23 About

[1810.07150] Subword Semantic Hashing for Intent Classification on Small Datasets

Tags:

2018-10-22 About

[1706.03762] Attention Is All You Need

Tags:

2018-10-12 About

[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Tags:

2018-10-12 About

[1710.06632] Towards a Seamless Integration of Word Senses into Downstream NLP Applications

Tags:

2018-10-09 About

[1810.00438] Parameter-free Sentence Embedding via Orthogonal Basis

Tags:

2018-10-06 About

[1704.05358] Representing Sentences as Low-Rank Subspaces

Tags:

2018-10-06 About

[1602.04938] "Why Should I Trust You?": Explaining the Predictions of Any Classifier

Tags:

2018-09-09 About

[1809.01797] Describing a Knowledge Base

Tags:

2018-09-07 About

[1809.00782] Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text

Tags:

2018-09-06 About

[1601.03764] Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Tags:

> Here it is shown that multiple word senses reside
in linear superposition within the word
embedding and simple sparse coding can recover
vectors that approximately capture the
senses

> Each extracted word sense is accompanied by one of about  2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense.

> The success of the approach is mathematically explained using a variant of
the random walk on discourses model

("random walk": a generative model for language). Under the assumptions of this model,  there
exists a linear relationship between the vector of a
word w and the vectors of the words in its contexts (It is not the average of the words in w's context, but in a given corpus the matrix of the linear relationship does not depend on w. It can be estimated, and so we can compute the embedding of a word from the contexts it belongs to)

[Related blog post](/doc/?uri=https%3A%2F%2Fwww.offconvex.org%2F2016%2F07%2F10%2Fembeddingspolysemy%2F)

2018-08-28 About

[1802.04865] Learning Confidence for Out-of-Distribution Detection in Neural Networks

Tags:

2018-08-27 About

[1601.00670] Variational Inference: A Review for Statisticians

Tags:

2018-08-07 About

[1803.01271] An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Tags:

2018-08-05 About

[1608.05426] A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments