About This Document
- sl:arxiv_author : Swapnil Ashok Jadhav
- sl:arxiv_firstAuthor : Swapnil Ashok Jadhav
- sl:arxiv_num : 2002.11402
- sl:arxiv_published : 2020-02-26T10:48:53Z
- sl:arxiv_summary : For a news content distribution platform like Dailyhunt, Named Entity
Recognition is a pivotal task for building better user recommendation and
notification algorithms. Apart from identifying names, locations, organisations
from the news for 13+ Indian languages and use them in algorithms, we also need
to identify n-grams which do not necessarily fit in the definition of
Named-Entity, yet they are important. For example, \"me too movement\", \"beef
ban\", \"alwar mob lynching\". In this exercise, given an English language text,
we are trying to detect case-less n-grams which convey important information
and can be used as topics and/or hashtags for a news. Model is built using
Wikipedia titles data, private English news corpus and BERT-Multilingual
pre-trained model, Bi-GRU and CRF architecture. It shows promising results when
compared with industry best Flair, Spacy and Stanford-caseless-NER in terms of
F1 and especially Recall.@en
- sl:arxiv_title : Detecting Potential Topics In News Using BERT, CRF and Wikipedia@en
- sl:arxiv_updated : 2020-02-28T18:44:07Z
- sl:bookmarkOf : https://arxiv.org/abs/2002.11402
- sl:creationDate : 2020-02-27
- sl:creationTime : 2020-02-27T23:36:54Z
Documents with similar tags (experimental)