Semanlink - ELMo

Parents:

Home Page ELMo

Related Tags:

Descendants

8 Documents (Long List)

Lecture 14 – Contextual Vectors | Stanford CS224U: Natural Language Understanding | Spring 2019

Tags:

2020-01-05 About

[1902.11269] Efficient Contextual Representation Learning Without Softmax Layer

Tags:

**how to accelerate contextual representation learning**.

> Contextual representation models are difficult to train due to the large parameter sizes and high computational complexity

> We find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size.
Therefore, we redesign the learning objectiv.
> Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings.
Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary.
When applied to ELMo, our method achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks.

**decouples learning contexts and words**

> Instead of using
a softmax layer to predict the distribution of the
missing word, we utilize and extend the SEMFIT
layer (Kumar and Tsvetkov, 2018) to **predict the
embedding of the missing word**.

2019-03-02 About

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar

Tags: