About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Yu Meng
- sl:arxiv_num : 2010.07245
- sl:arxiv_published : 2020-10-14T17:06:41Z
- sl:arxiv_summary : Current text classification methods typically require a good number of
human-labeled documents as training data, which can be costly and difficult to
obtain in real applications. Humans can perform classification without seeing
any labeled examples but only based on a small set of words describing the
categories to be classified. In this paper, we explore the potential of only
using the label name of each class to train classification models on unlabeled
data, without using any labeled documents. We use pre-trained neural language
models both as general linguistic knowledge sources for category understanding
and as representation learning models for document classification. Our method
(1) associates semantically related words with the label names, (2) finds
category-indicative words and trains the model to predict their implied
categories, and (3) generalizes the model via self-training. We show that our
model achieves around 90% accuracy on four benchmark datasets including topic
and sentiment classification without using any labeled documents but learning
from unlabeled data supervised by at most 3 words (1 in most cases) per class
as the label name.@en
- sl:arxiv_title : Text Classification Using Label Names Only: A Language Model Self-Training Approach@en
- sl:arxiv_updated : 2020-10-14T17:06:41Z
- sl:bookmarkOf : https://arxiv.org/abs/2010.07245
- sl:creationDate : 2021-10-16
- sl:creationTime : 2021-10-16T13:48:25Z
Documents with similar tags (experimental)