About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Kexin Wang
- sl:arxiv_num : 2104.06979
- sl:arxiv_published : 2021-04-14T17:02:18Z
- sl:arxiv_summary : Learning sentence embeddings often requires a large amount of labeled data.
However, for most tasks and domains, labeled data is seldom available and
creating it is expensive. In this work, we present a new state-of-the-art
unsupervised method based on pre-trained Transformers and Sequential Denoising
Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points.
It can achieve up to 93.1% of the performance of in-domain supervised
approaches. Further, we show that TSDAE is a strong domain adaptation and
pre-training method for sentence embeddings, significantly outperforming other
approaches like Masked Language Model.
A crucial shortcoming of previous studies is the narrow evaluation: Most work
mainly evaluates on the single task of Semantic Textual Similarity (STS), which
does not require any domain knowledge. It is unclear if these proposed methods
generalize to other domains and tasks. We fill this gap and evaluate TSDAE and
other recent approaches on four different datasets from heterogeneous domains.@en
- sl:arxiv_title : TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning@en
- sl:arxiv_updated : 2021-08-30T18:23:40Z
- sl:bookmarkOf : https://arxiv.org/abs/2104.06979
- sl:creationDate : 2021-09-01
- sl:creationTime : 2021-09-01T16:43:01Z
- sl:relatedDoc : http://www.semanlink.net/doc/2020/07/ukplab_sentence_transformers_s
Documents with similar tags (experimental)