sequence labelling tasks where the goal is to identify
the names of entities in a sentence. Named entities can
be proper nouns (locations, people, organizations...), or can be much more
domain-specific, such as diseases or genes in
[1806.04411] Named Entity Recognition with Extremely Limited Data (2018)(About) **"Named Entity Search (NES)"**
> We propose exploring **named entity recognition as a search task**, where the named entity class of interest is a query, and entities of that class are the relevant "documents". What should that query look like? Can we even perform NER-style labeling with tens of labels? This study presents an exploration of CRF-based NER models with handcrafted features and of how we might transform them into search queries.
> We do not propose this as a replacement
for NER, but as something to be used for an ephemeral or contextual
class of entity, when it does not make sense to label hundreds or
thousands of instances to learn a classifier
[1601.01343] Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation (2016)(About) > An embedding method specifically designed for NED that jointly **maps words and entities into the same continuous vector space**.
> We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words
Named Entity Recognition and the Road to Deep Learning (2017)(About) > the old
and the new-style NLP are not diametrically
opposed: just as it is possible (and useful!) to
incorporate neural-network features into a CRF,
CRFs have influenced some of the best deep
learning models for sequence labelling
This blog post go through the ways of doing NER, starting with CRF:
- When you develop a CRF,
a lot of time goes into finding feature functions (Does a word start with a capital? Is it uppercase? Is it a digit?...)
- Pb: semantic similarity between words.
- Gazetteers: lists with names of people, locations and organizations that are known in advance.
- feed Word Embeddings to a CRF: one way is to cluster a set
of word embeddings by distributional similarity, and
provide the CRF with the cluster IDs of a token and its
- Use word and character based embeddings
- LSTM not good enough -> biLSTM
- **biLSTM predict all labels independently of each other -> add a CRF layer** which outputs a matrix of transition scores between two states: dynamic programming can help find the optimal tag sequence for the sentence