Labeling data
http://www.semanlink.net/tag/labeling_data
Documents tagged with Labeling data[2309.06131] Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection
http://www.semanlink.net/doc/2023/09/2309_06131_annotating_data_fo
compare les Sentence Transformers, les cross encoders et Colbert dans le cadre low resource
> "optimal'' subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.
2023-09-14T00:47:05ZImprove OCR quality for receipt processing with Tesseract and Label Studio
http://www.semanlink.net/doc/2023/03/improve_ocr_quality_for_receipt
2023-03-13T16:24:50Zexplosion/prodigy-openai-recipes: ✨ Bootstrap annotation with zero- & few-shot learning via OpenAI GPT-3
http://www.semanlink.net/doc/2023/02/explosion_prodigy_openai_recipe
> example code on how to combine zero- and few-shot learning with a small annotation effort
2023-02-11T10:45:36ZDaniel Vila Suero sur Twitter : "Data annotation powered by vector search and @CohereAI embeddings..."
http://www.semanlink.net/doc/2023/01/daniel_vila_suero_sur_twitter_
> Instead of labeling examples one by one, find and bulk-label dozens of similar examples in a row
2023-01-23T16:25:24Ztagtog · AI-enabled Text Annotation Tool | PDF, Markdown, CSV, html, tweets, & many more Document types
http://www.semanlink.net/doc/2023/01/tagtog_%C2%B7_ai_enabled_text_annota
2023-01-09T14:00:08Z[Tutorial] How to Train LayoutLM on a Custom Dataset with Hugging Face
http://www.semanlink.net/doc/2023/01/tutorial_how_to_train_layoutl
> This guide is intended to walk you through the process of training LayoutLM on your own custom documents.
2023-01-09T13:55:46ZNLP Annotation Tools | UBIAI
http://www.semanlink.net/doc/2022/12/nlp_annotation_tools_%7C_ubiai
2022-12-21T13:46:24ZMatthew Honnibal sur Twitter : "We've been working on new prodi.gy workflows that let you use the @OpenAI API to kickstart your annotations, via zero- or few-shot learning. ..."
http://www.semanlink.net/doc/2022/12/matthew_honnibal_sur_twitter_
2022-12-20T00:03:04ZLabelling Data Using Snorkel - KDnuggets (2020)
http://www.semanlink.net/doc/2022/11/labelling_data_using_snorkel_
2022-11-25T01:42:05Z[2209.01975] Selective Annotation Makes Language Models Better Few-Shot Learners
http://www.semanlink.net/doc/2022/09/2209_01975_selective_annotati
> This work examines the implications of in-context learning
for the creation of datasets for new natural language tasks.
>
> Departing from
recent in-context learning methods, we formulate an annotation-efficient, two-step
framework: selective annotation that chooses a pool of examples to annotate from
unlabeled data in advance, followed by prompt retrieval that retrieves task examples
from the annotated pool at test time.
an
unsupervised, graph-based selective annotation method, vote-k, to select diverse,
representative examples to annotate
2022-09-07T13:20:58Z[2008.07267] A Survey of Active Learning for Text Classification using Deep Neural Networks
http://www.semanlink.net/doc/2022/09/2008_07267_a_survey_of_active
> investigates (D)NN-based AL for text classification and inspected factors obstructing its adoption
>
> - (a) the inability of NNs to provide reliable uncertainty estimates, on which the most commonly used query strategies rely, and
> - (b) the challenge of training DNNs on small data.
includes a taxonomy of query strategies
**AL et DNN, Contrasting Paradigms**:
> DNNs are known to excel in particularly at large-scale datasets, but often having large
amounts of data available is a strict requirement to perform well at all. AL on the other hand tries to
minimize the labeled data.
2022-09-06T18:43:54Z[2009.00236] A Survey of Deep Active Learning
http://www.semanlink.net/doc/2022/09/2009_00236_a_survey_of_deep_a
2022-09-06T18:40:19ZActive Learning: A Survey (C. Aggarwal 2014)
http://www.semanlink.net/doc/2022/09/active_learning_a_survey_c_a
2022-09-06T18:33:24ZActive Learning with AutoNLP and Prodigy
http://www.semanlink.net/doc/2022/09/active_learning_with_autonlp_an
2022-09-06T18:07:58ZActive Learning for BERT: An Empirical Study - ACL Anthology
http://www.semanlink.net/doc/2022/09/active_learning_for_bert_an_em
> The use of Actice Learning (AL)
with deep pre-trained models has so far received
little consideration.
>
> We study the
potential of (i) various AL strategies; (ii) in conjunction
with BERT, (iii) within a highly challenging
– yet common – real-world scenario of
class imbalance and scarce labeled data.
focused on binary classification
> AL can boost BERT performance, especially in the most realistic scenario in which the initial set of labeled examples is created using keyword-based queries, resulting in a biased sample of the minority class.
[Github](https://github.com/IBM/low-resource-text-classification-framework)
2022-09-02T16:08:49ZA framework for designing document processing solutions
http://www.semanlink.net/doc/2022/09/a_framework_for_designing_docum
2022-09-02T10:25:44Z[2204.08491] Active Learning Helps Pretrained Models Learn the Intended Task
http://www.semanlink.net/doc/2022/04/2204_08491_active_learning_he
2022-04-20T08:08:47ZHaystack Annotation Tool
http://www.semanlink.net/doc/2022/01/haystack_annotation_tool
2022-01-27T00:21:46ZExplosion 💥 sur Twitter : "...annotation tool and AutoNLP to train state-of-the-art NLP models!"
http://www.semanlink.net/doc/2021/12/explosion_%F0%9F%92%A5_sur_twitter_
[Active Learning with AutoNLP and Prodigy](doc:2022/09/active_learning_with_autonlp_an)
2021-12-30T17:47:27Z[2007.00077] Similarity Search for Efficient Active Learning and Search of Rare Concepts
http://www.semanlink.net/doc/2020/07/2007_00077_similarity_search_
> Similarity search for Efficient Active Learning and Search (SEALS)
In [Active Learning](tag:active_learning): instead of searching globally for the optimal examples to label, leverage the fact that data is often heavily skewed and expand the candidate pool with the nearest neighbors of the labeled set.
> Our work attacks **both the labeling and computational costs of machine learning**...SEALS dramatically reduces the barrier to machine learning, enabling small teams or individuals to
build accurate classifiers. **SEALS does, however, introduce another system component, a similarity
search index, which adds some additional engineering complexity** to build, tune, and maintain.
Fortunately, several highly optimized implementations like Annoy and [Faiss](doc:2020/06/facebookresearch_faiss_a_libra) work reasonably well
out of the box.
2020-07-02T15:31:34ZNER algo benchmark: spaCy, Flair, m-BERT and camemBERT on anonymizing French commercial legal cases
http://www.semanlink.net/doc/2019/12/ner_algo_benchmark_spacy_flai
Second post, [First part: Why we switched from Spacy to Flair to anonymize French case law](doc:2021/02/why_we_switched_from_spacy_to_f)
> It has been the most striking aspect of this project, each effort we put on the **annotation quality** has been translated to score improvement, even the smallest ones.
2019-12-17T14:46:24Z[1912.03927] Large deviations for the perceptron model and consequences for active learning
http://www.semanlink.net/doc/2019/12/_1912_03927_large_deviations_f
the task of choosing the subset of samples to be labeled from a fixed finite pool of samples
2019-12-11T02:26:25ZActive Learning | Synthesis Lectures on Artificial Intelligence and Machine Learning (2012)
http://www.semanlink.net/doc/2019/08/active_learning_%7C_synthesis_lec
2019-08-07T01:34:08ZActive learning literature survey (2010)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.4245
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns
2019-04-19T16:57:41ZWeak Supervision: A New Programming Paradigm for Machine Learning | SAIL Blog
https://ai.stanford.edu/blog/weak-supervision/
[Newer version](http://127.0.0.1:8080/semanlink/doc/2019/06/weak_supervision_the_new_progr) of more or less the same thing
2019-03-12T13:33:27Z