About This Document
- sl:arxiv_author :
- sl:arxiv_firstAuthor : Stephen H. Bach
- sl:arxiv_num : 1812.00417
- sl:arxiv_published : 2018-12-02T16:23:36Z
- sl:arxiv_summary : Labeling training data is one of the most costly bottlenecks in developing
machine learning-based applications. We present a first-of-its-kind study
showing how existing knowledge resources from across an organization can be
used as weak supervision in order to bring development time and cost down by an
order of magnitude, and introduce Snorkel DryBell, a new weak supervision
management system for this setting. Snorkel DryBell builds on the Snorkel
framework, extending it in three critical aspects: flexible, template-based
ingestion of diverse organizational knowledge, cross-feature production
serving, and scalable, sampling-free execution. On three classification tasks
at Google, we find that Snorkel DryBell creates classifiers of comparable
quality to ones trained with tens of thousands of hand-labeled examples,
converts non-servable organizational resources to servable models for an
average 52% performance improvement, and executes over millions of data points
in tens of minutes.@en
- sl:arxiv_title : Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale@en
- sl:arxiv_updated : 2019-06-03T22:52:25Z
- sl:bookmarkOf : https://arxiv.org/abs/1812.00417
- sl:creationDate : 2019-06-28
- sl:creationTime : 2019-06-28T00:31:17Z
- sl:relatedDoc : http://www.semanlink.net/doc/2019/06/google_ai_blog_harnessing_orga
Documents with similar tags (experimental)