Unsupervised Domain Adaptation (NLP)

Domain Adaptation with Generative Pseudo-Labeling (GPL) | Pinecone

2023-04-09T10:30:34Z

[1904.02817] Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

2023-01-12T16:29:04Z

[2205.11498] Domain Adaptation for Memory-Efficient Dense Retrieval

2022-09-26T17:46:39Z

Refers to [Binary Passage Retriever (BPR)](doc:2021/06/2106_00882_efficient_passage_)

Unsupervised Learning — Sentence-Transformers documentation

2022-08-20T01:16:16Z

> In our paper TSDAE we compare approaches for sentence embedding tasks, and in GPL we compare them for semantic search tasks (given a query, find relevant passages). While the unsupervised approach achieve acceptable performances for sentence embedding tasks, they perform poorly for semantic search tasks.

Domain transfer with GGPL: German Generative Pseudo Labeling 🥨 | by Matthias Richter | Jun, 2022 | ML6team

2022-06-02T13:55:12Z

Nils Reimers sur Twitter : "GPL goes multi-lingual..."

2022-06-01T17:45:24Z

[Domain transfer with GGPL: German Generative Pseudo Labeling](doc:2022/06/domain_transfer_with_ggpl_germ)

Ramsri Goutham Golla sur Twitter : "Hi @Nils_Reimers For GPL you used "msmarco-distilbert-base-tas-b" model and ..."

2022-04-27T22:17:10Z

Domain Adaptation — Sentence-Transformers documentation

2022-03-31T08:59:25Z

[2006.00632] Neural Unsupervised Domain Adaptation in NLP---A Survey

2022-03-30T01:13:03Z

Unsupervised Training of Retrievers Using GenQ (The Art of Asking Questions with GenQ) | Pinecone

2022-03-09T10:56:30Z

NAVER LABS Europe : "@Nils_Reimers of @huggingface on 'Unsupervised domain adaptation for neural search'"

2022-03-09T10:53:24Z

[2112.09118] Towards Unsupervised Dense Information Retrieval with Contrastive Learning

2021-12-21T11:26:40Z

> we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. [openreview](https://openreview.net/forum?id=jKN1pXi7b0)

[2112.07577] GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

2021-12-15T18:23:28Z

An unsupervised domain adaptation technique for dense retrieval models 1. synthetic queries are generated for each passage from the target corpus (using an existing pre-trained [T5](tag:text_to_text_transfer_transformer) encoder-decoder) 2. the generated queries are used for mining negative passages (retrieving the most similar paragraphs using an existing dense retrieval model == hard negatives!) 3. the query-passage pairs are labeled by a cross-encoder and used to train the domain-adapted dense retriever (using method described in [Hofstätter et al., 2020](doc:2021/12/2010_02666_improving_efficien)) [Nils Reimers sur Twitter](doc:2021/12/nils_reimers_sur_twitter_do_), [GitHub](https://github.com/UKPLab/gpl), by the author of [TSDAE](doc:2021/09/2104_06979_tsdae_using_trans) Claims to improve "Doc2Query" [Document Expansion by Query Prediction](doc:2022/01/1904_08375_document_expansion): ([src](https://twitter.com/KexinWang2049/status/1471435779415150598)) > - GPL: Uses doc2query to construct synthetic data and does knowledge distillation (i.e. training) on that data. > - Doc2query: Generates queries to extend the documents and use BM25 on top of them w/o training.

Nils Reimers sur Twitter : "Do dense retrieval models work out-of-the-box for your specific domain? Often the answer was No😢..."

2021-12-15T18:06:51Z

Unsupervised Training for Sentence Transformers | Pinecone

2021-11-24T21:03:44Z

Blog post about [[2104.06979] TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning](doc:2021/09/2104_06979_tsdae_using_trans) > Fine-tuning with TSDAE simply cannot compete in terms of performance against supervised methods. However, **the point and value of TSDAE is that it allows us to fine-tune models for use-cases where we have no data**. Specific domains with unique terminology or low resource languages.

[2104.06979] TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

2021-09-01T16:43:01Z

> The most successful previous approaches like InferSent (Conneau et al., 2017), Universial Sentence Encoder (USE) (Cer et al., 2018) and SBERT (Reimers and Gurevych, 2019) heavily relied on labeled data to train sentence embedding models. > > TSDAE can achieve up to 93.1% of the performance of indomain supervised approaches. Further, we show that TSDAE is **a strong domain adaptation and pre-training method for sentence embeddings**, significantly outperforming other approaches like Masked Language Model. > During training, TSDAE encodes corrupted sentences into fixed-sized vectors and requires the decoder to reconstruct the original sentences from this sentence embedding. - - [github](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/TSDAE) - [UKPLab/sentence-transformers: Sentence Embeddings with BERT & XLNet](doc:2020/07/ukplab_sentence_transformers_s) - [twitter](https://twitter.com/KexinWang2049/status/1433361957579538432): > **TSDAE can learn domain-specific sentence embeddings with unlabeled sentences** > > Most importantly, instead of STS (Semantic Textual Similarity), **we suggest evaluating unsupervised sentence embeddings on the domain-specific tasks&datasets, which is the real use case for them**. Actually, STS scores do not correlate with performance on specific tasks.