Semanlink - [2001.11631] Enhancement of Short Text Clustering by Iterative Classification

Tags:

About This Document

sl:arxiv_author :
sl:arxiv_firstAuthor : Md Rashadul Hasan Rakib
sl:arxiv_num : 2001.11631
sl:arxiv_published : 2020-01-31T02:12:05Z
sl:arxiv_summary : Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a clustering of short texts obtained using an arbitrary clustering algorithm, iterative classification applies outlier removal to obtain outlier-free clusters. Then it trains a classification algorithm using the non-outliers based on their cluster distributions. Using the trained classification model, iterative classification reclassifies the outliers to obtain a new set of clusters. By repeating this several times, we obtain a much improved clustering of texts. Our experimental results show that the proposed clustering enhancement method not only improves the clustering quality of different clustering methods (e.g., k-means, k-means--, and hierarchical clustering) but also outperforms the state-of-the-art short text clustering methods on several short text datasets by a statistically significant margin.@en
sl:arxiv_title : Enhancement of Short Text Clustering by Iterative Classification@en
sl:arxiv_updated : 2020-01-31T02:12:05Z
sl:bookmarkOf : https://arxiv.org/abs/2001.11631
sl:creationDate : 2021-05-20
sl:creationTime : 2021-05-20T17:59:46Z

File info

Documents with similar tags (experimental)

Tags:

2021-05-20 About

Tags:

2017-11-07 About

Tags:

2017-11-04 About