Named Entity Recognition and the Road to Deep Learning (2017)(About) > the old
and the new-style NLP are not diametrically
opposed: just as it is possible (and useful!) to
incorporate neural-network features into a CRF,
CRFs have influenced some of the best deep
learning models for sequence labelling
This blog post go through the ways of doing NER, starting with CRF:
- When you develop a CRF,
a lot of time goes into finding feature functions (Does a word start with a capital? Is it uppercase? Is it a digit?...)
- Pb: semantic similarity between words.
- Gazetteers: lists with names of people, locations and organizations that are known in advance.
- feed Word Embeddings to a CRF: one way is to cluster a set
of word embeddings by distributional similarity, and
provide the CRF with the cluster IDs of a token and its
- Use word and character based embeddings
- LSTM not good enough -> biLSTM
- **biLSTM predict all labels independently of each other -> add a CRF layer** which outputs a matrix of transition scores between two states: dynamic programming can help find the optimal tag sequence for the sentence
Natural Language Processing with Small Feed-Forward Networks(About) google guys:
> We show that small and shallow feed- forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.