Semanlink - [1904.02817] Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

[1904.02817] Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling

Tags:

About This Document

sl:arxiv_author :
- Xiaochuang Han
- Jacob Eisenstein
sl:arxiv_firstAuthor : Xiaochuang Han
sl:arxiv_num : 1904.02817
sl:arxiv_published : 2019-04-04T23:05:45Z
sl:arxiv_summary : Contextualized word embeddings such as ELMo and BERT provide a foundation for strong performance across a wide range of natural language processing tasks by pretraining on large corpora of unlabeled text. However, the applicability of this approach is unknown when the target domain varies substantially from the pretraining corpus. We are specifically interested in the scenario in which labeled data is available in only a canonical source domain such as newstext, and the target domain is distinct from both the labeled and pretraining texts. To address this scenario, we propose domain-adaptive fine-tuning, in which the contextualized embeddings are adapted by masked language modeling on text from the target domain. We test this approach on sequence labeling in two challenging domains: Early Modern English and Twitter. Both domains differ substantially from existing pretraining corpora, and domain-adaptive fine-tuning yields substantial improvements over strong BERT baselines, with particularly impressive results on out-of-vocabulary words. We conclude that domain-adaptive fine-tuning offers a simple and effective approach for the unsupervised adaptation of sequence labeling to difficult new domains.@en
sl:arxiv_title : Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling@en
sl:arxiv_updated : 2019-09-05T00:18:25Z
sl:bookmarkOf : https://arxiv.org/abs/1904.02817
sl:creationDate : 2023-01-12
sl:creationTime : 2023-01-12T16:29:04Z

File info

Bookmark of: https://arxiv.org/abs/1904.02817

Documents with similar tags (experimental)

[2006.00632] Neural Unsupervised Domain Adaptation in NLP---A Survey

Tags:

2022-03-30 About

[2112.09118] Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Tags:

2021-12-21 About

[2112.07577] GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Tags:

An unsupervised domain adaptation technique for dense retrieval models

1. synthetic queries
are generated for each passage from the target corpus (using an existing pre-trained [T5](tag:text_to_text_transfer_transformer)
encoder-decoder)
2. the generated queries are used for mining negative
passages (retrieving the most similar
paragraphs using an existing dense retrieval
model == hard negatives!)
3. the query-passage pairs are labeled by a cross-encoder and used to train the domain-adapted
dense retriever (using method described in [Hofstätter et al.,
2020](doc:2021/12/2010_02666_improving_efficien))

[Nils Reimers sur Twitter](doc:2021/12/nils_reimers_sur_twitter_do_), [GitHub](https://github.com/UKPLab/gpl),  by the author of [TSDAE](doc:2021/09/2104_06979_tsdae_using_trans)

Claims to improve "Doc2Query" [Document Expansion by Query Prediction](doc:2022/01/1904_08375_document_expansion): ([src](https://twitter.com/KexinWang2049/status/1471435779415150598))

> - GPL: Uses doc2query to construct synthetic data and does knowledge distillation (i.e. training) on that data.
> - Doc2query: Generates queries to extend the documents and use BM25 on top of them w/o training.

2021-12-15 About

[1909.06356] Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering

Tags: