Few-shot learning
http://www.semanlink.net/tag/few_shot_learning
Documents tagged with Few-shot learningUKP Lab sur X : "a lightweight solution for few-shot domain-specific sentence classification: AdaSent!..."
http://www.semanlink.net/doc/2023/12/ukp_lab_sur_x_need_a_lightwe
AdaSent is an approach to creating domain-specialized sentence encoders for few-shot sentence classification
> Reusable general sentence adapter across domains
> AdaSent decouples DAPT (Domain Adaptative Pre-Training) & SEPT (Sentence Embedding Pre-Training) **by storing the sentence encoding abilities into an adapter**, which is trained only once in the general domain and plugged into various DAPT-ed PLMs
[Github](https://github.com/UKPLab/AdaSent)
2023-12-09T19:40:21ZNils Reimers sur Twitter : "Cross-Lingual Text-Classification just from English Data"
http://www.semanlink.net/doc/2023/07/nils_reimers_sur_twitter_cro
> find counterfactual statements in customer reviews from 8 example:
> - Fine-tuning: 13% accuracy
> - Embedding based: 61% accuracy
for classif: nearest neighbour < nearest centroïd < logistic regression classifier:
> lightweight logistic regression classifier is the fastest and best method, especially with more training data.
[Unlocking the Power of Cross-Lingual Classification in NLP](doc:2023/07/unlocking_the_power_of_cross_li)
2023-07-20T08:33:39ZDaniel Vila Suero sur Twitter : "Data quality is key for LLMs, but we're building Open Source LLMs with data of "unknown" quality... Introducing Alpaca GarbageCollector..."
http://www.semanlink.net/doc/2023/04/daniel_vila_suero_sur_twitter_
> a cross-lingual SetFit model to identify potential bad instructions in Alpaca-like datasets
2023-04-05T18:37:29ZAndrej Karpathy sur Twitter : "Base LLMs (non-finetuned) make very strong few-shot classifiers. Describe task in English, give few examples, read off the label probabilities on test example. No gradient-based optimization necessary. It brings a cannon to a knife fight but is fast, convenient, strong baseline." / Twitter
http://www.semanlink.net/doc/2023/03/andrej_karpathy_sur_twitter_
2023-03-19T14:50:11Zexplosion/prodigy-openai-recipes: ✨ Bootstrap annotation with zero- & few-shot learning via OpenAI GPT-3
http://www.semanlink.net/doc/2023/02/explosion_prodigy_openai_recipe
> example code on how to combine zero- and few-shot learning with a small annotation effort
2023-02-11T10:45:36Z[2302.01398] The unreasonable effectiveness of few-shot learning for machine translation
http://www.semanlink.net/doc/2023/02/2302_01398_the_unreasonable_e
> We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems
[tweet](https://twitter.com/mr_cheu/status/1622648632867422211?s=20&t=DLVMU-Qrp9DksDse99fkjQ)
2023-02-07T18:49:52ZIBM/zshot: Zero and Few shot named entity & relationships recognition
http://www.semanlink.net/doc/2022/12/ibm_zshot_zero_and_few_shot_na
2022-12-23T01:00:31ZValueError "invalid literal for int() with base 10" in trainer.evaluate (dataset created from pandas) · Issue #228 · huggingface/setfit
http://www.semanlink.net/doc/2022/12/valueerror_invalid_literal_for
see <https://github.com/huggingface/setfit/blob/main/notebooks/zero-shot-classification.ipynb>
> Note: some datasets on the Hugging Face Hub don't have a ClassLabel feature for the label column. In these cases, you should compute the candidate labels manually by first computing the id2label mapping as follows:
2022-12-13T11:46:14ZFew-Shot Text Classification (Cloudera 2020)
http://www.semanlink.net/doc/2022/11/few_shot_text_classification_c
> Sentence-BERT has been optimized… well, for sentences! It’s reasonable to suspect that SBERT’s representations of single words or short phrases like “Business” or “Science & Technology” won’t be as semantically relevant as representations derived from a word-level method, like word2vec or GloVe
2022-11-24T14:16:39ZNeural representational geometry underlies few-shot concept learning | PNAS
http://www.semanlink.net/doc/2022/10/neural_representational_geometr
2022-10-30T18:11:06Z[2104.11882] Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System
http://www.semanlink.net/doc/2022/10/2104_11882_incremental_few_sh
2022-10-25T11:46:21ZLewis Tunstall sur Twitter : "The SetFit library for few-shot learning with Sentence Transformers now supports *multi-label text classification*..."
http://www.semanlink.net/doc/2022/10/lewis_tunstall_sur_twitter_t
Multilabel support [github issue](https://github.com/huggingface/setfit/issues/65)
2022-10-14T15:24:53Zhuggingface/setfit: Efficient few-shot learning with Sentence Transformers
http://www.semanlink.net/doc/2022/10/huggingface_setfit_efficient_f
2022-10-12T23:41:16ZZshot: Zero and Few shot named entity & relationships recognition
http://www.semanlink.net/doc/2022/10/zshot_zero_and_few_shot_named_
2022-10-01T20:13:51Z[2209.11055] Efficient Few-Shot Learning Without Prompts
http://www.semanlink.net/doc/2022/09/2209_11055_efficient_few_shot
[tweet](https://twitter.com/_akhaliq/status/1573109469646561280?s=20&t=RTpK9dh90az0zT1Xg2ohpQ):
> So if I have 4 classes and say 2 labels per class, I would first fine tune an ST on these 4 pairs and then vectorize the 8 total examples for fine-tuning the classifier
2022-09-23T10:26:46Z[2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners
http://www.semanlink.net/doc/2022/09/2209_01975_selective_annotati
> This work examines the implications of in-context learning
for the creation of datasets for new natural language tasks.
>
> Departing from
recent in-context learning methods, we formulate an annotation-efficient, two-step
framework: selective annotation that chooses a pool of examples to annotate from
unlabeled data in advance, followed by prompt retrieval that retrieves task examples
from the annotated pool at test time.
an
unsupervised, graph-based selective annotation method, vote-k, to select diverse,
representative examples to annotate
2022-09-07T13:20:58ZExtractive Question Answering application. • Raphael Sourty
http://www.semanlink.net/doc/2022/09/extractive_question_answering_a
2022-09-07T08:25:09ZWhat Makes a Good Classification Example?
http://www.semanlink.net/doc/2022/08/what_makes_a_good_classificatio
> With Large Language Models, we only need a few examples to train a Classifier. What makes a good example? Find out here.
2022-08-16T22:36:20Z[2208.03299] Few-shot Learning with Retrieval Augmented Language Model
http://www.semanlink.net/doc/2022/08/2208_03299_few_shot_learning_
> Atlas,
a retrieval-augmented language model capable of strong few-shot learning, despite having lower parameter
counts than other powerful recent few-shot learners.
[tweet](https://twitter.com/davisblalock/status/1564148889996836864?s=20&t=BnLM_O1HkTp7qJILF0DW8g)
2022-08-08T11:32:33Z[2109.06270] STraTA: Self-Training with Task Augmentation for Better Few-shot Learning
http://www.semanlink.net/doc/2022/04/2109_06270_strata_self_train
[Tu Vu sur Twitter](doc:2022/04/tu_vu_sur_twitter_enormous_l)
2022-04-14T19:26:35ZTu Vu sur Twitter : "Enormous LMs like GPT-3 exhibit impressive few-shot performance, but w/ self-training a BERT base sized model can achieve much better results!
http://www.semanlink.net/doc/2022/04/tu_vu_sur_twitter_enormous_l
> [[2109.06270] STraTA: Self-Training with Task Augmentation for Better Few-shot Learning](doc:2022/04/2109_06270_strata_self_train)
[Github](https://github.com/google-research/google-research/tree/master/STraTA) [at HuggingFace](https://github.com/huggingface/transformers/tree/main/examples/research_projects/self-training-text-classification)
--
Remark: Like [[2203.10581] Cluster & Tune: Boost Cold Start Performance in Text Classification](doc:2022/04/2203_10581_cluster_tune_bo), adds an intermediate fine-tuning step // TODO compare
2022-04-13T13:37:58ZSentence Transformer Fine-Tuning (SetFit): Outperforming GPT-3 on few-shot Text-Classification while being 1600 times smaller | by Moshe Wasserblat (2021-12)
http://www.semanlink.net/doc/2022/03/sentence_transformer_fine_tunin
Finetuning d'un SBERT sur une tâche de classification (in fine, produit un SBERT)
> **Few-shot text classification based on fine-tuning a Sentence Transformer with task-specific data** that can easily be implemented with the sentence-transformers library
> Surprisingly, we did not find any
work that performed an end-to-end ST fine-tuning for text classification in
a Siamese manner.
[COLAB](https://colab.research.google.com/github/MosheWasserb/SetFit/blob/main/SetFit_SST_2.ipynb)
[Nils Reimers sur Twitter](doc:2022/03/nils_reimers_sur_twitter_gre)
2022-03-31T10:49:48ZNils Reimers sur Twitter : "Great post on SetFit"
http://www.semanlink.net/doc/2022/03/nils_reimers_sur_twitter_gre
About [Sentence Transformer Fine-Tuning (SetFit): Outperforming GPT-3 on few-shot Text-Classification while being 1600 times smaller | by Moshe Wasserblat](doc:2022/03/sentence_transformer_fine_tunin)
> - Outperforms GPT-3 in few-shot text-classification (50 labeled examples, secret test set)
> - 1600 times smaller
> - Can be run on your CPU
> - No limitation on the number of training examples
> - Just few lines of code needed
2022-03-31T10:48:50Z[2203.14655] Few-Shot Learning with Siamese Networks and Label Tuning
http://www.semanlink.net/doc/2022/03/2203_14655_few_shot_learning_
> the problem of building text classifiers with little or no training data.
>
> In recent years, an approach based on neural textual entailment models has been found to give strong results on a diverse range of tasks.
(cf. #[NLI](tag:nli), using the input text as the premise and the text representing the label as the hypothesis)
> In this work, we show that **with proper pre-training, Siamese Networks that embed texts and labels** offer a competitive alternative.
>
> We introduce **label tuning: fine-tuning the label embeddings only**. While giving lower performance than model fine-tuning (which updates all params of the model), this approach has the architectural advantage that a single encoder can be shared by many different tasks (we only fine-tune the label embeddings)
> The drop in quality can
be compensated by using a variant of **[Knowledge distillation](tag:knowledge_distillation)**
[Github](https://tinyurl.com/label-tuning), [Tweet](doc:2022/03/thomas_muller_sur_twitter_pa)
2022-03-30T16:14:44ZThomas Müller sur Twitter : "paper & code of a novel light-weight few-shot model based on sentence embeddings..."
http://www.semanlink.net/doc/2022/03/thomas_muller_sur_twitter_pa
> The idea is simple: It's well known that you can use sentence embedding models to build zero-shot models by encoding the input text and a label description. You can improve quality by fine-tuning the encoder. Instead of tuning the entire encoder **you can just tune the label embeddings**.
[Paper](doc:2022/03/2203_14655_few_shot_learning_)
2022-03-30T15:48:13Z[1712.05972] Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
http://www.semanlink.net/doc/2021/10/1712_05972_train_once_test_a
> The model learns to predict whether a given sentence is related to a tag or not; unlike other classifiers that learn to classify the sentence as one of the possible classes
input: concatenation of the embedding of text and embedding of tag ; output : related / not related (binary classifier)
> We can say that this technique learns the concept of relatedness between
a sentence and a word that can be extended beyond datasets. That said, the levels of accuracy leave
a lot of scope for future work.
2021-10-16T13:59:40Z[2010.07245] Text Classification Using Label Names Only: A Language Model Self-Training Approach
http://www.semanlink.net/doc/2021/10/2010_07245_text_classificatio
> In this paper, we explore the potential of only **using the label name of each class** to train classification models on unlabeled data, **without using any labeled documents**. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method
> 1. associates semantically related words with the label names,
> 2. finds category-indicative words and trains the model to predict their implied categories, and
> 3. generalizes the model via self-training.
2021-10-16T13:48:25Z[2104.14690] Entailment as Few-Shot Learner
http://www.semanlink.net/doc/2021/05/2104_14690_entailment_as_few_
> a new approach, named as EFL, that can turn small LMs into better few-shot learners. The key idea of this approach is to reformulate potential NLP task into an entailment one, and then fine-tune the model with as little as 8 examples
>
> For instance, we can reformulate a sentiment classification task as a textual entailment one
with an input sentence S1 as
xin = [CLS]S1[SEP]S2[EOS]; where S2 = This indicates positive user sentiment,
and let the language modelMto determine the if input sentence S1 entails the label description S2
2021-05-03T23:05:39ZDistill our @huggingface zero-shot classifier with your specified class names
http://www.semanlink.net/doc/2021/02/distill_our_huggingface_zero_s
You can now train an efficient classifier with unlabeled data This new script lets you **distill our @huggingface zero-shot classifier with your specified class names, speeding up inference by 100x or more**
[Zero-shot classifier distillation at master · huggingface/transformers](doc:2021/02/zero_shot_classifier_distillati)
2021-02-23T13:57:46ZZero-shot classifier distillation at master · huggingface/transformers
http://www.semanlink.net/doc/2021/02/zero_shot_classifier_distillati
This script provides a way to improve the speed and memory performance of a zero-shot classifier by training a more efficient student model from the zero-shot teacher's predictions over an unlabeled dataset.
2021-02-23T13:54:22ZZero-Shot Learning in Modern NLP | Joe Davison Blog (2020-05)
http://www.semanlink.net/doc/2021/02/zero_shot_learning_in_modern_nl
> state-of-the-art NLP
models for sequence classification without large annotated training
sets.
Simple idea: use a single model (eg. [Sentence-BERT](tag:sbert)) to embed both the text data and the class names into the same space.
Pb: Sentence-BERT is designed to learn
effective sentence-level, not single- or multi-word representations like our
class names -> the label
embeddings may not be as semantically salient as word-level
embedding methods (i.e. word2vec).
Solution 1: Learn a projection from sentence level embeddings of words to word2vec embeddings, use it for encoding when learning classifier. Can be adapted to few short learning
Solution 2: "Classification as [#Natural Language Inference](tag:nli)".
> A method which not only embeds
sequences and labels into the same latent space where their distance can
be measured, but that can actually tell us something about the compatibility
of two distinct sequences out of the box.
2021-02-23T13:44:34Z[2012.15723] Making Pre-trained Language Models Better Few-shot Learners
http://www.semanlink.net/doc/2021/01/2012_15723
> a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples
[Tweet](https://twitter.com/adamjfisch/status/1345185238276861953)
2021-01-02T22:42:12ZTowards Unsupervised Text Classification Leveraging Experts and Word Embeddings - (ACL 2019)
http://www.semanlink.net/doc/2020/10/towards_unsupervised_text_class
Unsupervised approach to classify
documents into categories simply described by
a label
> The proposed method... draws on textual similarity between the most
relevant words in each document and a dictionary
of keywords for each category reflecting
its semantics and lexical field. The novelty
of our method hinges on the enrichment
of the category labels through a combination
of human expertise and language models, both
generic and domain specific.
> models the task as a
**text similarity problem between two sets of words:
One containing the most relevant words in the document
and another containing keywords derived
from the label of the target category**. While the
key advantage of this approach is its simplicity, its
success hinges on the good definition of a dictionary
of words for each category.
2020-10-05T00:28:20ZUnsupervised text classification with word embeddings - Max Halford
http://www.semanlink.net/doc/2020/10/classifying_documents_without_a
Title was "Classifying documents without any training data". Mentions this [paper](doc:2020/10/towards_unsupervised_text_class)
2020-10-05T00:09:59ZLearning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge (ACL Anthology 2020)
http://www.semanlink.net/doc/2020/07/learning_to_tag_oov_tokens_by_i
Aim to leverage both contextual representation of input text (deep LMs) and knowledge derived
from curated KBs ([Wordnet](tag:wordnet)) to improve [slot tagging](tag:slot_tagging) in the presence of [out-of-vocab](tag:oov) words ([few-shot scenario](tag:few_shot_learning))
Method:
1. retrieve potentially relevant KB entities and
encode them into distributed representations that
describe global graph-structured information
2. BERT encoder
layer to capture context-aware representations of
the sequence and attend to the KB embeddings
using multi-level graph attention
3. integrate
BERT embeddings and the KB embeddings
to predict the slot type
Contributions:
1. feasibility of applying lexical ontology
to facilitate recognizing OOV words. First to consider the large-scale background
knowledge for enhancing context-aware
slot tagging models.
2. a knowledge integration mechanism that uses multi-level graph
attention to model explicit lexical relations.
3.experiments on two benchmark datasets
> our method makes a notable difference in a
scenario where samples are linguistically diverse,
and large vocab exists.
(Better improvements when using RNN than BERT, because BERT already contains a lot of background knowledge)
2020-07-04T11:34:35ZOne Shot learning, Siamese networks and Triplet Loss with Keras
http://www.semanlink.net/doc/2019/10/one_shot_learning_siamese_netw
2019-10-13T19:00:46ZSiamese Neural Networks for One-shot Image Recognition (2015)
http://www.semanlink.net/doc/2019/08/siamese_neural_networks_for_one
> "Humans exhibit a strong ability to acquire and recognize
new patterns."
> we learn image representations via a supervised
metric-based approach with siamese neural networks, **then
reuse that network’s features for one-shot learning without
any retraining**.
2019-08-06T18:36:48ZOne Shot Learning with Siamese Networks using Keras
http://www.semanlink.net/doc/2019/06/one_shot_learning_with_siamese_
the network is learning a **similarity function**, which takes two images as input and expresses how similar they are.
> Assume that we want to build face recognition system for a small organization with only 10 employees...
2019-06-28T19:00:27ZKnowledge-Based Short Text Categorization Using Entity and Category Embedding | Springer for Research & Development (2019)
http://www.semanlink.net/doc/2019/05/knowledge_based_short_text_cate
> we propose a novel probabilistic model for Knowledge-Based Short Text Categorization (KBSTC), **which does not require any labeled training data to classify a short text**. This is achieved by leveraging **entities and categories from large knowledge bases**, which are further embedded into a common vector space, for which we propose a new entity and category embedding model. **Given a short text, its category (e.g. Business, Sports, etc.) can then be derived based on the entities mentioned in the text by exploiting semantic similarity between entities and categories**
2019-05-30T11:38:19ZAdvances in few-shot learning: reproducing results in PyTorch
https://towardsdatascience.com/advances-in-few-shot-learning-reproducing-results-in-pytorch-aba70dee541d
2018-12-02T10:21:44ZAdvances in few-shot learning: a guided tour – Towards Data Science
https://towardsdatascience.com/advances-in-few-shot-learning-a-guided-tour-36bc10a68b77
2018-12-02T10:16:29Z[1703.03129] Learning to Remember Rare Events
https://arxiv.org/abs/1703.03129
> a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training.
> Our memory module can be easily added to any part of a supervised neural network
2018-10-23T12:36:58ZHuman-level concept learning through probabilistic program induction (2015)
https://cims.nyu.edu/~brenden/LakeEtAl2015Science.pdf
> People learning new concepts can often generalize successfully from just a single example, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy... We present a computational model that captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets
2018-01-04T14:56:17Z[1603.05106] One-Shot Generalization in Deep Generative Models
http://arxiv.org/abs/1603.05106v1
2016-03-18T00:02:19Z