Better Language Models and Their Implications(About) Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model
Improving Language Understanding with Unsupervised Learning(About) > can we develop one model, train it in an unsupervised way on a large amount of data, and then fine-tune the model to achieve good performance on many different tasks? Our results indicate that this approach works surprisingly well; the same core model can be fine-tuned for very different tasks with minimal adaptation.
a scalable, task-agnostic system based on a combination of two existing ideas: transformers and unsupervised pre-training
unsupervised generative pre-training of language models followed by discriminative fine-tunning