Semanlink - NLP@Facebook

elvis sur Twitter : "NEW: Meta AI introduces OPT-IML, a large language model (175B) fine-tuned on 2000 NLP tasks. Uses instruction-tuning to improve zero-shot and few-shot generalization abilities...."

Tags:

2022-12-23 About

[1904.04458] Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition

Tags:

Knowledge Augmented
Language Model (KALM)

a language
model with access to information available in a
KB, no assumptions
about the availability of additional components
(such as Named Entity Taggers) or annotations

> While classes of
named entities (e.g., person or location) occur frequently,
each individual name (e.g, Atherton or
Zhouzhuang) may be observed infrequently even
in a very large corpus of text. As a result language
models learn to represent accurately only the most
popular named entities

> knowing that Alice is a name
used to refer to a person should give ample information
about the context in which the word may
occur (e.g., Bob visited Alice).

> ---

> extends a traditional **RNN LM**

> we enhance a traditional LM with a
gating mechanism that controls whether a particular
word is modeled as a general word or as a reference
to an entity
>
> We train the model end-to-end
with only the traditional predictive language modeling
perplexity objective
>
> KALM is trained end-to-end using
a predictive objective on large corpus of text.

> To the best of our knowledge, KALM is the first
unsupervised neural NER approach.

> KALM extends a traditional, RNN-based neural
LM.

2022-08-31 About

AllenNLP sur Twitter : "Dataset: training data for @MetaAI 's No Language Left Behind NLLB-200 models!..."

Tags:

2022-08-25 About

Timo Schick sur Twitter : "PEER, a language model trained to incrementally write texts & collaborate w/ humans ..."

Tags:

2022-08-25 About

[2012.15156] A Memory Efficient Baseline for Open Domain Question Answering

Tags:

2022-08-08 About

[2208.03299] Few-shot Learning with Retrieval Augmented Language Model

Tags:

2022-08-08 About

[2206.10658] Questions Are All You Need to Train a Dense Passage Retriever

Tags:

> **approach for training dense retrieval models that does not require any labeled training data**. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples.
>
> ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents).
>
> It uses a new document-retrieval autoencoding scheme, where
> 1. an input question is used to retrieve a set of evidence documents, and
> 2. the documents are then used to compute the probability of reconstructing the original question.
>
> Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning.

[Tweet](doc:2022/07/devendra_singh_sachan_sur_twitt)

> Given an
input question, ART first retrieves a small set
of possible evidences documents. It then recon
structs
the original question by attending to these
documents
>
> The
key idea in ART is to consider the retrieved documents
as a noisy representation of the original
question and question reconstruction probability
as a way of denoising that provides soft-labels for
how likely each document is to have been the correct
result

Refers to [[IZACARD 2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering](doc:2020/12/2012_04584_distilling_knowled)

2022-07-06 About

No Language Left Behind

Tags:

2022-07-06 About

Devendra Singh Sachan sur Twitter : "...Unsupervised Passage Re-ranker (UPR), an approach to re-rank retrieved passages for information retrieval tasks."

Tags:

2022-04-18 About

Jason Weston sur Twitter : "SeeKeR: An open source search-augmented language model"

Tags:

2022-03-25 About

[2009.02252] KILT: a Benchmark for Knowledge Intensive Language Tasks

Tags:

2022-01-23 About

[2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Tags:

2022-01-19 About

[2112.09118] Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Tags:

2021-12-21 About

facebookresearch/DrQA: Reading Wikipedia to Answer Open-Domain Questions

Tags:

2021-12-08 About

[1906.04980] Unsupervised Question Answering by Cloze Translation

Tags:

2021-12-08 About

facebookresearch/UnsupervisedQA: Unsupervised Question answering via Cloze Translation

Tags:

2021-12-07 About

[1911.02116] Unsupervised Cross-lingual Representation Learning at Scale

Tags:

2021-07-29 About

[2004.04906] Dense Passage Retrieval for Open-Domain Question Answering

Tags:

2021-06-03 About

[2104.14690] Entailment as Few-Shot Learner

Tags:

2021-05-03 About

[1901.04085] Passage Re-ranking with BERT

Tags:

2021-03-26 About

[2010.02194] Self-training Improves Pre-training for Natural Language Understanding

Tags:

2021-03-12 About

Hugging Face sur Twitter : "Transformers release of the Retrieval-Augmented Generation model in collaboration with @facebookai!"

Tags:

2021-02-23 About

[2010.00904] Autoregressive Entity Retrieval

Tags:

2021-01-14 About

[2012.04584] Distilling Knowledge from Reader to Retriever for Question Answering

Tags:

> a method to train an information retrieval module for downstream tasks, **without using pairs of queries and documents as annotations**.

Uses two models (standard pipeline for open-domain QA):

- the first one retrieves documents from a large source of knowledge (the retriever)
- the second one processes the support documents to solve the task (the reader).

> First the retriever selects support passages in a large knowledge
source. Then these passages are processed by the reader, along with the question, to generate an
answer

Inspired by knowledge distillation: the reader model is the teacher and the retriever is the student.

> More precisely, we use a sequence-to-sequence model as the reader, and use
the attention activations over the input documents as synthetic labels to train the retriever. 
> (**train the retriever by learning to approximate the attention score of the reader**)

Refers to:

- [REALM: Retrieval-Augmented Language Model Pre-Training](doc:2020/12/2002_08909_realm_retrieval_a)
- [Dehghani: Neural Ranking Models with Weak Supervision](doc:?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.08803)

2020-12-11 About

Guillaume Lample sur Twitter : "Last year, we showed that you can outperform a 24-layer transformer in language modeling with just...

Tags:

2020-10-10 About

[2006.15020] Pre-training via Paraphrasing

Tags:

2020-06-30 About

[1911.03814] Scalable Zero-shot Entity Linking with Dense Entity Retrieval

Tags:

2020-05-02 About

[1911.00172] Generalization through Memorization: Nearest Neighbor Language Models

Tags:

2019-12-20 About

[1912.01412] Deep Learning for Symbolic Mathematics

Tags:

2019-12-09 About

[1905.11852] EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction

Tags:

2019-12-05 About

[1911.01464] Emerging Cross-lingual Structure in Pretrained Language Models

Tags:

2019-11-06 About

[1909.01066] Language Models as Knowledge Bases?

Tags:

2019-09-05 About

Product-Key Memory (PKM) Minimalist implementation of a Product-Key Memory layer

Tags:

2019-08-30 About

[1907.05242] Large Memory Layers with Product Keys

Tags:

> **a structured memory which can be easily integrated into a neural network.** The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on **product keys**, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time.

> a key-value memory layer that can increase model capacity for a negligible computational cost. A 12-layer transformer with a memory outperforms a 24-layer transformer, and is 2x faster!

[Implementation](/doc/2019/08/product_key_memory_pkm_minima)

TODO: compare with [[2007.00849] Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge](doc:2020/07/2007_00849_facts_as_experts_)

2019-07-13 About

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification (2017)

Tags:

2019-05-20 About

LASER natural language processing toolkit - Facebook Code

Tags: