Semanlink - [2002.11402] Detecting Potential Topics In News Using BERT, CRF and Wikipedia

Tags:

About This Document

sl:arxiv_author : Swapnil Ashok Jadhav
sl:arxiv_firstAuthor : Swapnil Ashok Jadhav
sl:arxiv_num : 2002.11402
sl:arxiv_published : 2020-02-26T10:48:53Z
sl:arxiv_summary : For a news content distribution platform like Dailyhunt, Named Entity Recognition is a pivotal task for building better user recommendation and notification algorithms. Apart from identifying names, locations, organisations from the news for 13+ Indian languages and use them in algorithms, we also need to identify n-grams which do not necessarily fit in the definition of Named-Entity, yet they are important. For example, \"me too movement\", \"beef ban\", \"alwar mob lynching\". In this exercise, given an English language text, we are trying to detect case-less n-grams which convey important information and can be used as topics and/or hashtags for a news. Model is built using Wikipedia titles data, private English news corpus and BERT-Multilingual pre-trained model, Bi-GRU and CRF architecture. It shows promising results when compared with industry best Flair, Spacy and Stanford-caseless-NER in terms of F1 and especially Recall.@en
sl:arxiv_title : Detecting Potential Topics In News Using BERT, CRF and Wikipedia@en
sl:arxiv_updated : 2020-02-28T18:44:07Z
sl:bookmarkOf : https://arxiv.org/abs/2002.11402
sl:creationDate : 2020-02-27
sl:creationTime : 2020-02-27T23:36:54Z

File info

Documents with similar tags (experimental)

Tags:

2019-04-11 About

Tags:

2018-03-05 About