Semanlink - [2203.10581] Cluster & Tune: Boost Cold Start Performance in Text Classification

[2203.10581] Cluster & Tune: Boost Cold Start Performance in Text Classification

Tags:

[Leshem Choshen sur Twitter : "Labelled data is scarce, what can we do?..."](doc:2022/04/leshem_choshen_sur_twitter_l)

> **One-sentence Summary**: we suggest adding an unsupervised intermediate classification step, before finetunning and after pretraining BERT, and show it improves performance for data-constrained cases.

> for text classification cold start (when labeled
data is scarce), **add an intermediate unsupervised
classification task**, between the pretraining
and fine-tuning phases:
> perform clustering and
train the pre-trained model on predicting the
cluster labels.

> this additional
classification phase can significantly improve
performance, mainly for **topical classification**
tasks

> we use an efficient clustering technique,
that relies on simple Bag Of Words (BOW)
representations, to partition the unlabeled training
data into relatively homogeneous clusters of text
instances.
>
> Next, we treat these clusters as labeled
data for an intermediate text classification task, and
train the pre-trained model – with or without additional
MLM pretraining – with respect to this
multi-class problem, prior to the final fine-tuning
over the actual target-task labels

> The underlying
intuition is that inter-training the model
over a related text classification task would be more
beneficial compared to MLM inter-training, which
focuses on different textual entities, namely predicting
the identity of a single token.

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Eyal Shnarch
sl:arxiv_num : 2203.10581
sl:arxiv_published : 2022-03-20T15:29:34Z
sl:arxiv_summary : In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce. In such cases, the common practice of fine-tuning pre-trained models, such as BERT, for a target classification task, is prone to produce poor performance. We suggest a method to boost the performance of such models by adding an intermediate unsupervised classification task, between the pre-training and fine-tuning phases. As such an intermediate task, we perform clustering and train the pre-trained model on predicting the cluster labels. We test this hypothesis on various data sets, and show that this additional classification phase can significantly improve performance, mainly for topical classification tasks, when the number of labeled instances available for fine-tuning is only a couple of dozen to a few hundred.@en
sl:arxiv_title : Cluster & Tune: Boost Cold Start Performance in Text Classification@en
sl:arxiv_updated : 2022-03-20T15:29:34Z
sl:bookmarkOf : https://arxiv.org/abs/2203.10581
sl:creationDate : 2022-04-06
sl:creationTime : 2022-04-06T01:22:32Z
sl:relatedDoc : http://www.semanlink.net/doc/2022/04/leshem_choshen_sur_twitter_l

File info

Bookmark of: https://arxiv.org/abs/2203.10581

Linked From

Tu Vu sur Twitter : "Enormous LMs like GPT-3 exhibit impressive few-shot performance, but w/ self-training a BERT base sized model can achieve much better results!

Tags:

2022-04-13 About

Leshem Choshen sur Twitter : "Labelled data is scarce, what can we do?..."

Tags:

2022-04-06 About

Documents with similar tags (experimental)

[2306.04640] ModuleFormer: Modularity Emerges from Mixture-of-Experts

Tags:

2023-09-16 About

[2212.01340] Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Tags:

2022-12-06 About

[2210.13952] KnowGL: Knowledge Generation and Linking from Text

Tags:

2022-11-13 About

Text classification by labeling words | Proceedings of the 19th national conference on Artifical intelligence (2004)

Tags:

2022-11-08 About

[2010.00711] A Survey of the State of Explainable AI for Natural Language Processing

Tags:

2022-09-08 About

[2008.07267] A Survey of Active Learning for Text Classification using Deep Neural Networks

Tags:

2022-09-06 About

Active Learning for BERT: An Empirical Study - ACL Anthology

Tags:

2022-09-02 About

Tu Vu sur Twitter : "Enormous LMs like GPT-3 exhibit impressive few-shot performance, but w/ self-training a BERT base sized model can achieve much better results!

Tags:

2022-04-13 About

Leshem Choshen sur Twitter : "Labelled data is scarce, what can we do?..."

Tags:

2022-04-06 About

[2108.13934] Robust Retrieval Augmented Generation for Zero-shot Slot Filling

Tags:

2022-01-19 About

[2106.13474] Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains

Tags:

2021-10-21 About

[1712.05972] Train Once, Test Anywhere: Zero-Shot Learning for Text Classification

Tags:

2021-10-16 About

[2004.10964] Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Tags:

2020-12-01 About

[1911.11506] Word-Class Embeddings for Multiclass Text Classification

Tags:

> In supervised tasks such as multiclass
text classification (the focus of this article) it seems appealing to enhance word representations
with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class
embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings,
they substantially facilitate the training of deep-learning models in multiclass classification by
topic.
>
> A differentiating aspect of our method is that it keeps the modelling of word-class interactions separate from the
original word embedding. Word-class correlations are confined in a dedicated vector space, whose vectors enhance
(by concatenation) the unsupervised representations. The net effect is an embedding matrix that is better suited to
classification, and imposes no restriction to the network architecture using it.

[github](https://github.com/AlexMoreo/word-class-embeddings). Refers to [LEAM](doc:2020/02/joint_embedding_of_words_and_la) :

> [in LEAM] Once words and labels are embedded in a common vector space, word-label
compatibility is measured via cosine similarity. Our method instead models these compatibilities directly, without
generating intermediate embeddings for words or labels.

2020-10-11 About

[2004.03705] Deep Learning Based Text Classification: A Comprehensive Review

Tags:

2020-10-11 About

[1909.01259] Neural Attentive Bag-of-Entities Model for Text Classification

Tags:

A model that performs **text classification using entities in a knowledge base**.

> Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine **simple high-recall entity detection based on a dictionary** (word->list of entities), to detect entities in a document, with a novel neural **attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities**.

2 steps:

1. Entity detection
2. Classification using the detected entities (+text) as inputs

Regarding entity linking, a local model which uses cosine
similarity between the embedding of the target
entity and the word-based representation of
the document to capture the relevance of an entity
given a document.

Embeddings from the KB: computed using [#Wikipedia2Vec](tag:wikipedia2vec) (similar words and entities
close to one another in a unified vector space)

Model using attention, with 2 features :

- cosine similarity between the
embedding of the entity and the word based
representation of the document
- the probability that the entity
name refers to the entity in KB.

Somewhat [related](doc:2020/01/investigating_entity_knowledge_)

### Conclusion:

>a neural
network model that performs text classification using
entities in Wikipedia. We combined simple
dictionary-based entity detection with a neural attention
mechanism to enable the model to focus
on a small number of unambiguous and relevant
entities in a document.

2020-09-02 About

[1805.04174] Joint Embedding of Words and Labels for Text Classification (ACL Anthology 2018)

Tags:

2020-02-18 About

[1909.04120] Span Selection Pre-training for Question Answering

Tags:

> a **new pre-training task inspired by reading
comprehension** and an **effort to avoid encoding general knowledge in the transformer network itself**

Current transformer architectures store general knowledge -> large models, long pre-training time. Better to offload the requirement of general knowledge to a sparsely activated network.

"Span selection" as an additional auxiliary task: the query is a sentence drawn from a corpus
with a term replaced with a special token: [BLANK]. The term replaced by the blank is the answer term. The passage is
relevant as determined by a BM25 search, and answer-bearing (containing the answer
term). Unlike BERT’s cloze task, where the answer must be drawn from the model itself, the answer is found in a passage
using language understanding.

> **We hope to progress to a model of general purpose language modeling that uses an indexed long
term memory to retrieve world knowledge, rather than holding it in the densely activated transformer encoder layers.**

2019-09-18 About

[1904.08398] DocBERT: BERT for Document Classification

Tags:

2019-04-18 About

[1903.05872] Interactive Concept Mining on Personal Data -- Bootstrapping Semantic Services

Tags:

2019-03-17 About

[1902.05196] Categorical Metadata Representation for Customized Text Classification

Tags:

2019-02-18 About

[1801.06146] Universal Language Model Fine-tuning for Text Classification

Tags:

2018-01-19 About

[1607.01759] Bag of Tricks for Efficient Text Classification

Tags:

2017-09-10 About