GPT-2

Share This
« Back to Glossary Index

Generative Pretrained Transformer 2, or GPT-2, is an advanced AI model specifically engineered for natural language processing tasks. This model, launched by OpenAI in February 2019, is renowned for its versatility in generating a wide array of text types, with its prowess extending to answering queries and completing code automatically. GPT-2’s training involved a vast online text corpus, WebText, and it operates on a staggering 1.5 billion parameters. Despite its resource-intensive nature, GPT-2 has found usage in diverse and innovative applications such as text-centric adventure games and subreddit simulations. Initial misuse fears led to the full GPT-2 model’s release in November 2019 when the concerns didn’t manifest. However, to address resource constraints, a smaller model, DistilGPT2, was developed. The innovations and successes of GPT-2 set the stage for future progress in AI text generation.

GPT-2 (Wikipedia)

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

Generative Pre-trained Transformer 2 (GPT-2)
Original author(s)OpenAI
Initial release14 February 2019; 5 years ago (14 February 2019)
Repositoryhttps://github.com/openai/gpt-2
PredecessorGPT-1
SuccessorGPT-3
Type
LicenseMIT
Websiteopenai.com/blog/gpt-2-1-5b-release/

GPT-2 was created as a "direct scale-up" of GPT-1 with a ten-fold increase in both its parameter count and the size of its training dataset. It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, which enabled it to translate texts, answer questions about a topic from a text, summarize passages from a larger text, and generate text output on a level sometimes indistinguishable from that of humans, however it could become repetitive or nonsensical when generating long passages. It was superseded by GPT-3 and GPT-4 models, which are not open source anymore.

GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model, which uses attention instead of older recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models.

« Back to Glossary Index
Keep up with updates
en_USEnglish