BERT + Sentence Embeddings
http://www.semanlink.net/tag/bert_and_sentence_embeddings
Documents tagged with BERT + Sentence EmbeddingsUKP Lab sur X : "a lightweight solution for few-shot domain-specific sentence classification: AdaSent!..."
http://www.semanlink.net/doc/2023/12/ukp_lab_sur_x_need_a_lightwe
AdaSent is an approach to creating domain-specialized sentence encoders for few-shot sentence classification
> Reusable general sentence adapter across domains
> AdaSent decouples DAPT (Domain Adaptative Pre-Training) & SEPT (Sentence Embedding Pre-Training) **by storing the sentence encoding abilities into an adapter**, which is trained only once in the general domain and plugged into various DAPT-ed PLMs
[Github](https://github.com/UKPLab/AdaSent)
2023-12-09T19:40:21Zskeskinen/bert.cpp: ggml implementation of BERT
http://www.semanlink.net/doc/2023/05/skeskinen_bert_cpp_ggml_implem
> ggml inference of BERT neural net architecture with pooling and normalization from SentenceTransformers (sbert.net). High quality sentence embeddings in pure C++ (with C API).
>
> The main goal of bert.cpp is to run the BERT model using **4-bit integer quantization on CPU**
2023-05-09T00:29:27ZDomain Adaptation with Generative Pseudo-Labeling (GPL) | Pinecone
http://www.semanlink.net/doc/2023/04/domain_adaptation_with_generati
2023-04-09T10:30:34ZDaniel Vila Suero sur Twitter : "Data quality is key for LLMs, but we're building Open Source LLMs with data of "unknown" quality... Introducing Alpaca GarbageCollector..."
http://www.semanlink.net/doc/2023/04/daniel_vila_suero_sur_twitter_
> a cross-lingual SetFit model to identify potential bad instructions in Alpaca-like datasets
2023-04-05T18:37:29ZMultilingual Sentence Transformers | Pinecone
http://www.semanlink.net/doc/2023/01/multilingual_sentence_transform
Focus on **Multilingual Knowledge Distillation**
> recent method introduced by Nils Reimers and Iryna Gurevych in 2020
> The teacher model is an already fine-tuned sentence transformer used for creating embeddings in a single language (most likely English). The student model is a transformer that has been pretrained on a multilingual corpus.
2023-01-13T01:45:12ZValueError "invalid literal for int() with base 10" in trainer.evaluate (dataset created from pandas) · Issue #228 · huggingface/setfit
http://www.semanlink.net/doc/2022/12/valueerror_invalid_literal_for
see <https://github.com/huggingface/setfit/blob/main/notebooks/zero-shot-classification.ipynb>
> Note: some datasets on the Hugging Face Hub don't have a ClassLabel feature for the label column. In these cases, you should compute the candidate labels manually by first computing the id2label mapping as follows:
2022-12-13T11:46:14ZFew-Shot Text Classification (Cloudera 2020)
http://www.semanlink.net/doc/2022/11/few_shot_text_classification_c
> Sentence-BERT has been optimized… well, for sentences! It’s reasonable to suspect that SBERT’s representations of single words or short phrases like “Business” or “Science & Technology” won’t be as semantically relevant as representations derived from a word-level method, like word2vec or GloVe
2022-11-24T14:16:39ZLewis Tunstall sur Twitter : "The SetFit library for few-shot learning with Sentence Transformers now supports *multi-label text classification*..."
http://www.semanlink.net/doc/2022/10/lewis_tunstall_sur_twitter_t
Multilabel support [github issue](https://github.com/huggingface/setfit/issues/65)
2022-10-14T15:24:53Zhuggingface/setfit: Efficient few-shot learning with Sentence Transformers
http://www.semanlink.net/doc/2022/10/huggingface_setfit_efficient_f
2022-10-12T23:41:16Z[2205.11498] Domain Adaptation for Memory-Efficient Dense Retrieval
http://www.semanlink.net/doc/2022/09/2205_11498_domain_adaptation_
Refers to [Binary Passage Retriever (BPR)](doc:2021/06/2106_00882_efficient_passage_)
2022-09-26T17:46:39Z[2209.11055] Efficient Few-Shot Learning Without Prompts
http://www.semanlink.net/doc/2022/09/2209_11055_efficient_few_shot
[tweet](https://twitter.com/_akhaliq/status/1573109469646561280?s=20&t=RTpK9dh90az0zT1Xg2ohpQ):
> So if I have 4 classes and say 2 labels per class, I would first fine tune an ST on these 4 pairs and then vectorize the 8 total examples for fine-tuning the classifier
2022-09-23T10:26:46ZPromptBERT improving BERT sentence embeddings with prompts - Ethan Kim
http://www.semanlink.net/doc/2022/09/promptbert_improving_bert_sente
2022-09-16T10:31:11Z[2201.04337] PromptBERT: Improving BERT Sentence Embeddings with Prompts
http://www.semanlink.net/doc/2022/09/2201_04337_promptbert_improv
[PromptBERT improving BERT sentence embeddings with prompts - Ethan Kim](doc:2022/09/promptbert_improving_bert_sente)
2022-09-16T10:06:59ZUnsupervised Learning — Sentence-Transformers documentation
http://www.semanlink.net/doc/2022/08/unsupervised_learning_sentenc
> In our paper TSDAE we compare approaches for sentence embedding tasks, and in GPL we compare them for semantic search tasks (given a query, find relevant passages). While the unsupervised approach achieve acceptable performances for sentence embedding tasks, they perform poorly for semantic search tasks.
2022-08-20T01:16:16ZTrain and Fine-Tune Sentence Transformers Models
http://www.semanlink.net/doc/2022/08/train_and_fine_tune_sentence_tr
2022-08-13T09:49:57Zsentence bert model in onnx format · Issue #46 · UKPLab/sentence-transformers
http://www.semanlink.net/doc/2022/06/sentence_bert_model_in_onnx_for
2022-06-13T12:38:47Z[2205.15952] Knowledge Graph -- Deep Learning: A Case Study in Question Answering in Aviation Safety Domain
http://www.semanlink.net/doc/2022/06/2205_15952_knowledge_graph_
2022-06-11T01:48:52ZDomain transfer with GGPL: German Generative Pseudo Labeling 🥨 | by Matthias Richter | Jun, 2022 | ML6team
http://www.semanlink.net/doc/2022/06/domain_transfer_with_ggpl_germ
2022-06-02T13:55:12ZNils Reimers sur Twitter : "GPL goes multi-lingual..."
http://www.semanlink.net/doc/2022/06/nils_reimers_sur_twitter_gpl
[Domain transfer with GGPL: German Generative Pseudo Labeling](doc:2022/06/domain_transfer_with_ggpl_germ)
2022-06-01T17:45:24Z[2205.04260] EASE: Entity-Aware Contrastive Learning of Sentence Embedding
http://www.semanlink.net/doc/2022/05/2205_04260_ease_entity_aware
> we explore a type of supervision
that has been under-explored in the literature: entity
hyperlink annotations from Wikipedia.
>
> entities have been shown to
be a strong indicator of text semantics
>
> a method for mining hard negatives
based on the entity type
Uses wikipedia2vec
> the reliance on Wikipedia for training
data may limit the application of the models
to specific domains (e.g., general or encyclopedia
domains). To apply EASE to other domains, one
may need to annotate text from the domain either
manually or automatically.
2022-05-11T01:25:12ZRamsri Goutham Golla sur Twitter : "Hi @Nils_Reimers For GPL you used "msmarco-distilbert-base-tas-b" model and ..."
http://www.semanlink.net/doc/2022/04/ramsri_goutham_golla_sur_twitte
2022-04-27T22:17:10Z[2008.11228] A simple method for domain adaptation of sentence embeddings
http://www.semanlink.net/doc/2022/04/2008_11228_a_simple_method_fo
2022-04-01T14:07:28Z[2004.05119] Beyond Fine-tuning: Few-Sample Sentence Embedding Transfer
http://www.semanlink.net/doc/2022/03/2004_05119_beyond_fine_tuning
> Fine-tuning (FT) pre-trained sentence embedding models on small datasets has been shown to have limitations. In this paper we show that concatenating the embeddings from the pre-trained model with those from a simple sentence embedding model trained only on the target data, can improve over the performance of FT for few-sample tasks
2022-03-31T21:04:02ZSentence Transformer Fine-Tuning (SetFit): Outperforming GPT-3 on few-shot Text-Classification while being 1600 times smaller | by Moshe Wasserblat (2021-12)
http://www.semanlink.net/doc/2022/03/sentence_transformer_fine_tunin
Finetuning d'un SBERT sur une tâche de classification (in fine, produit un SBERT)
> **Few-shot text classification based on fine-tuning a Sentence Transformer with task-specific data** that can easily be implemented with the sentence-transformers library
> Surprisingly, we did not find any
work that performed an end-to-end ST fine-tuning for text classification in
a Siamese manner.
[COLAB](https://colab.research.google.com/github/MosheWasserb/SetFit/blob/main/SetFit_SST_2.ipynb)
[Nils Reimers sur Twitter](doc:2022/03/nils_reimers_sur_twitter_gre)
2022-03-31T10:49:48ZNils Reimers sur Twitter : "Great post on SetFit"
http://www.semanlink.net/doc/2022/03/nils_reimers_sur_twitter_gre
About [Sentence Transformer Fine-Tuning (SetFit): Outperforming GPT-3 on few-shot Text-Classification while being 1600 times smaller | by Moshe Wasserblat](doc:2022/03/sentence_transformer_fine_tunin)
> - Outperforms GPT-3 in few-shot text-classification (50 labeled examples, secret test set)
> - 1600 times smaller
> - Can be run on your CPU
> - No limitation on the number of training examples
> - Just few lines of code needed
2022-03-31T10:48:50ZSentence Embedding Fine-tuning for the French Language | by La Javaness R&D | Feb, 2022 | Medium
http://www.semanlink.net/doc/2022/03/sentence_embedding_fine_tuning_
2022-03-31T10:06:14ZDomain Adaptation — Sentence-Transformers documentation
http://www.semanlink.net/doc/2022/03/domain_adaptation_sentence_tr
2022-03-31T08:59:25Z[2203.14655] Few-Shot Learning with Siamese Networks and Label Tuning
http://www.semanlink.net/doc/2022/03/2203_14655_few_shot_learning_
> the problem of building text classifiers with little or no training data.
>
> In recent years, an approach based on neural textual entailment models has been found to give strong results on a diverse range of tasks.
(cf. #[NLI](tag:nli), using the input text as the premise and the text representing the label as the hypothesis)
> In this work, we show that **with proper pre-training, Siamese Networks that embed texts and labels** offer a competitive alternative.
>
> We introduce **label tuning: fine-tuning the label embeddings only**. While giving lower performance than model fine-tuning (which updates all params of the model), this approach has the architectural advantage that a single encoder can be shared by many different tasks (we only fine-tune the label embeddings)
> The drop in quality can
be compensated by using a variant of **[Knowledge distillation](tag:knowledge_distillation)**
[Github](https://tinyurl.com/label-tuning), [Tweet](doc:2022/03/thomas_muller_sur_twitter_pa)
2022-03-30T16:14:44ZDocument Matching for Job Descriptions | Semantic Scholar (2021)
http://www.semanlink.net/doc/2022/03/document_matching_for_job_descr
> We train a document encoder to match online job descriptions to one of many standardized job roles from Singapore’s Skills Framework. The encoder generates semantically meaningful document encodings from textual descriptions of job roles, which are then compared using Cosine Similarity to determine matching. During training, we implement the methodology used by Sentence-BERT, fine tuning pre-trained BERT models using a siamese network architecture on labelled document pairs.
2022-03-09T18:18:50ZNAVER LABS Europe : "@Nils_Reimers of @huggingface on 'Unsupervised domain adaptation for neural search'"
http://www.semanlink.net/doc/2022/03/naver_labs_europe_nils_reim
2022-03-09T10:53:24ZNils Reimers sur Twitter : "Creating intent classes for chatbots is challenging This tutorial shows how to use sentence-transformers to find potentially overlapping intent classes and how to improve your data annotation work." / Twitter
http://www.semanlink.net/doc/2022/02/nils_reimers_sur_twitter_cre
2022-02-19T22:55:07ZNils Reimers sur Twitter : "how to use the fast clustering algorithm from sentence-transformers..."
http://www.semanlink.net/doc/2022/02/nils_reimers_sur_twitter_how
Clustering millions of sentences to optimize the ML-workflow
2022-02-19T10:37:15Zsentence-transformers/fast_clustering.py at master · UKPLab/sentence-transformers
http://www.semanlink.net/doc/2022/02/sentence_transformers_fast_clus
> This is a more complex example on performing clustering on large scale dataset. This examples find in a large set of sentences local communities, i.e., groups of sentences that are highly similar. You can freely configure the threshold what is considered as similar. A high threshold will only find extremely similar sentences, a lower threshold will find more sentence that are less similar. A second parameter is 'min_community_size': Only communities with at least a certain number of sentences will be returned. The method for finding the communities is extremely fast, for clustering 50k sentences it requires only 5 seconds (plus embedding comuptation). In this example, we download a large set of questions from Quora and then find similar questions in this set.
2022-02-18T14:45:22Zgsarti/scibert-nli · Hugging Face
http://www.semanlink.net/doc/2022/01/gsarti_scibert_nli_%C2%B7_hugging_fa
SciBERT fine-tuned on the SNLI and the MultiNLI datasets using the sentence-transformers library to produce universal sentence embeddings
2022-01-29T15:52:08ZSemantic Search — Sentence-Transformers documentation
http://www.semanlink.net/doc/2022/01/semantic_search_sentence_tran
**symmetric** semantic search vs **asymmetric** semantic search
> - Suitable models for symmetric semantic search: Pre-Trained Sentence Embedding
> - Suitable models for asymmetric semantic search: Pre-Trained MS MARCO Models
2022-01-29T15:28:25ZUsing pretrained SBERT model in cross-encoder · Issue #726 · UKPLab/sentence-transformers
http://www.semanlink.net/doc/2021/12/using_pretrained_sbert_model_in
> so would it be a good idea to finetune a SBERT model on a cross-encoder task?
>
> The SBERT models are regular transformers model and hence can be used as base for cross encoders. Sometimes it could be helpful, otherwise it is better to use the original models. ([Nils Reimers](tag:nils_reimers))
2021-12-17T00:41:33ZAdvance BERT model via transferring knowledge from Cross-Encoders to Bi-Encoders | by Chien Vu | Towards Data Science
http://www.semanlink.net/doc/2021/12/advance_bert_model_via_transfer
Data Augmentation Method to improve SBERT Bi-Encoders for Pairwise Sentence Scoring Tasks (Semantic sentence tasks)
2021-12-17T00:26:39Z[2112.07577] GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
http://www.semanlink.net/doc/2021/12/2112_07577_gpl_generative_ps
An unsupervised domain adaptation technique for dense retrieval models
1. synthetic queries
are generated for each passage from the target corpus (using an existing pre-trained [T5](tag:text_to_text_transfer_transformer)
encoder-decoder)
2. the generated queries are used for mining negative
passages (retrieving the most similar
paragraphs using an existing dense retrieval
model == hard negatives!)
3. the query-passage pairs are labeled by a cross-encoder and used to train the domain-adapted
dense retriever (using method described in [Hofstätter et al.,
2020](doc:2021/12/2010_02666_improving_efficien))
[Nils Reimers sur Twitter](doc:2021/12/nils_reimers_sur_twitter_do_), [GitHub](https://github.com/UKPLab/gpl), by the author of [TSDAE](doc:2021/09/2104_06979_tsdae_using_trans)
Claims to improve "Doc2Query" [Document Expansion by Query Prediction](doc:2022/01/1904_08375_document_expansion): ([src](https://twitter.com/KexinWang2049/status/1471435779415150598))
> - GPL: Uses doc2query to construct synthetic data and does knowledge distillation (i.e. training) on that data.
> - Doc2query: Generates queries to extend the documents and use BM25 on top of them w/o training.
2021-12-15T18:23:28ZSemantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
http://www.semanlink.net/doc/2021/12/semantic_search_through_a_vecto
2021-12-05T10:48:53ZUnsupervised_Extractive_Summarization - a Hugging Face Space by Hellisotherpeople
http://www.semanlink.net/doc/2021/12/unsupervised_extractive_summari
Unsupervised Extractive Text Summarization and Semantic Search
[Github](https://github.com/Hellisotherpeople/CX_DB8)
2021-12-03T09:28:38ZUnsupervised Training for Sentence Transformers | Pinecone
http://www.semanlink.net/doc/2021/11/unsupervised_training_for_sente
Blog post about [[2104.06979] TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning](doc:2021/09/2104_06979_tsdae_using_trans)
> Fine-tuning with TSDAE simply cannot compete in terms of performance against supervised methods.
However, **the point and value of TSDAE is that it allows us to fine-tune models for use-cases where we have no data**. Specific domains with unique terminology or low resource languages.
2021-11-24T21:03:44ZHow to Fine-Tune Sentence-BERT for Question Answering | Capital One
http://www.semanlink.net/doc/2021/11/how_to_fine_tune_sentence_bert_
> tutorial on using the sentence-transformers library to fine-tune Sentence-BERT for question matching
2021-11-21T12:38:13ZMultilingual Sentence Transformers | Pinecone
http://www.semanlink.net/doc/2021/11/multilingual_sentence_transform
How to make a text encoder multilingual using sentence transformers and multilingual knowledge distillation.
2021-11-04T23:09:34ZNext-Gen Sentence Embeddings with Multiple Negatives Ranking Loss | Pinecone
http://www.semanlink.net/doc/2021/10/next_gen_sentence_embeddings_wi
> the world of sentence embeddings was ignited with the introduction of SBERT in 2019. Since then, many more sentence transformers have been introduced. These models quickly made the original SBERT obsolete. How did these newer sentence transformers manage to outperform SBERT so quickly? The answer is **multiple negatives ranking (MNR) loss**.
> In short; **fine-tune your models with MNR loss, and do it with the [sentence-transformers](tag:sbert) library**.
(mentionned in a [tweet](https://twitter.com/Nils_Reimers/status/1453001422400856086) by [Nils Reimers](tag:nils_reimers))
2021-10-27T01:24:49ZSentence Embeddings and Transformers | Pinecone
http://www.semanlink.net/doc/2021/10/sentence_embeddings_and_transfo
2021-10-23T01:04:37ZSahajtomar/french_semantic · Hugging Face
http://www.semanlink.net/doc/2021/10/sahajtomar_french_semantic_%C2%B7_hu
2021-10-14T16:08:39ZSemantic Search with S-BERT is all you need
http://www.semanlink.net/doc/2021/06/semantic_search_with_s_bert_is_
> SentenceTransformers is designed in such way that fine-tuning your own sentence / text embeddings models is easy.
2021-06-05T16:02:26ZNils Reimers sur Twitter : "SBERT Release v1.1.0"
http://www.semanlink.net/doc/2021/04/nils_reimers_sur_twitter_sbe
2021-04-22T19:35:49Z[2011.05864] On the Sentence Embeddings from Pre-trained Language Models
http://www.semanlink.net/doc/2021/04/2011_05864_on_the_sentence_em
> **the sentence
embeddings from the pre-trained language
models without fine-tuning have been
found to poorly capture semantic meaning of
sentences.**
>
> We find that **BERT always induces
a non-smooth anisotropic semantic space of
sentences**, which harms its performance of
semantic similarity. To address this issue,
we propose to transform the anisotropic sentence
embedding distribution to a smooth and
isotropic Gaussian distribution through normalizing
flows that are learned with an unsupervised
objective
> normalizing flows (Dinh et al., 2015): invertible function parameterized by neural networks.
> **During
training, only the flow network is optimized
while the BERT parameters remain unchanged**
> When combined with external supervision from
natural language inference tasks (Bowman et al.,
2015; Williams et al., 2018), our method outperforms
the [Sentence-BERT](tag:sbert) embeddings
[GitHub](https://github.com/bohanli/BERT-flow)
2021-04-19T01:13:25ZSimCSE: Simple Contrastive Learning of Sentence Embeddings
http://www.semanlink.net/doc/2021/04/simcse_simple_contrastive_lear
(by one of the authors of [KEPLER](doc:2020/11/1911_06136_kepler_a_unified_))
a contrastive sentence
embedding framework, which can be used to produce
sentence embeddings, from either
unlabeled or labeled data.
> 1. **an unsupervised approach,
which takes an input sentence and predicts
itself in a contrastive objective, with only
standard dropout** used as noise
> 2. we draw inspiration
from the recent success of learning sentence
embeddings from natural language inference
(NLI) datasets and incorporate annotated
pairs from NLI datasets into contrastive
learning by using “entailment” pairs as positives
and “contradiction” pairs as hard negatives
Cites [[2011.05864] On the Sentence Embeddings from Pre-trained Language Models](doc:2021/04/2011_05864_on_the_sentence_em) (question of the anisotropic semantic space of BERT's sentences)
2021-04-18T18:28:29ZNils Reimers sur Twitter : "New models for Neural Information Retrieval..."
http://www.semanlink.net/doc/2021/04/nils_reimers_sur_twitter_new
2021-04-17T10:07:14ZSentenceTransformers Documentation
http://www.semanlink.net/doc/2021/03/sentencetransformers_documentat
2021-03-25T19:05:01ZZero-Shot Learning in Modern NLP | Joe Davison Blog (2020-05)
http://www.semanlink.net/doc/2021/02/zero_shot_learning_in_modern_nl
> state-of-the-art NLP
models for sequence classification without large annotated training
sets.
Simple idea: use a single model (eg. [Sentence-BERT](tag:sbert)) to embed both the text data and the class names into the same space.
Pb: Sentence-BERT is designed to learn
effective sentence-level, not single- or multi-word representations like our
class names -> the label
embeddings may not be as semantically salient as word-level
embedding methods (i.e. word2vec).
Solution 1: Learn a projection from sentence level embeddings of words to word2vec embeddings, use it for encoding when learning classifier. Can be adapted to few short learning
Solution 2: "Classification as [#Natural Language Inference](tag:nli)".
> A method which not only embeds
sequences and labels into the same latent space where their distance can
be measured, but that can actually tell us something about the compatibility
of two distinct sequences out of the box.
2021-02-23T13:44:34ZUKPLab/sentence-transformers: Sentence Embeddings with BERT & XLNet
http://www.semanlink.net/doc/2020/07/ukplab_sentence_transformers_s
[paper](doc:2019/08/_1908_10084_sentence_bert_sen)
2020-07-14T19:08:40ZHow to use BERT for finding similar sentences or similar news? · Issue #876 · huggingface/transformers
http://www.semanlink.net/doc/2020/07/how_to_use_bert_for_finding_sim
links to [UKPLab/sentence-transformers](doc:2020/07/ukplab_sentence_transformers_s)
[Another answer](https://github.com/huggingface/transformers/issues/2986)
2020-07-12T15:26:41ZRicher Sentence Embeddings using Sentence-BERT — Part I
http://www.semanlink.net/doc/2020/01/richer_sentence_embeddings_usin
Simplistic (and often used) methods for sentence embeddings with BERT are too simplistic to be good (avearaging the word vectors, or using the \[CLS\] special vector (start of sequence).
[About this paper](/doc/2019/08/_1908_10084_sentence_bert_sen)
2020-01-06T01:48:12Z[1908.10084] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
http://www.semanlink.net/doc/2019/08/_1908_10084_sentence_bert_sen
> Sentence-BERT
(SBERT), a modification of the pretrained
BERT network that use siamese and triplet network
structures to derive **semantically meaningful
sentence embeddings** that can be compared
using cosine-similarity.
Important because
- BERT ist unsuitable for semantic similarity
search as well as for unsupervised tasks
like clustering.
- simple methods such as using the CLS token give low quality sentence embeddings
However, the purpose of SBERT sentence embeddings
are **not to be used for transfer learning for other
tasks**.
[Related blog post](/doc/2020/01/richer_sentence_embeddings_usin); [Github](https://github.com/UKPLab/sentence-transformers)
2019-08-28T22:41:55Z