Un logo bleu et blanc pour un outil de gestion des médias sociaux appelé Socialionals.

GPT-1

Partager
" Retour à l'index des glossaires

The machine learning model, GPT-1 or Generative Pre-training Transformer 1, is a creation of OpenAI, specifically engineered for the comprehension and generation of human language tasks. It features a 12-layer, decoder-only transformer structure, equipped with twelve 64-dimensional states masked self-attention heads. The optimization of GPT-1’s performance is achieved using the Adam optimization algorithme[1], which features a linearly increasing learning rate. With a remarkable 117 million parameters, GPT-1 showcases its intricate design. Despite its advanced structure, minimal adjustments are required when it’s deployed for different tasks. Its proficiency is particularly evident in natural language déduction[2] tasks, question answering, commonsense reasoning, and semantic similarity tasks. One key resource for this model is the BookCorpus dataset, chosen for its lengthy passages that facilitate the management of long-range information.

Définitions des termes
1. algorithme. Un ensemble d'instructions ou de règles clairement définies et offrant une solution à un problème ou à une tâche spécifique est connu sous le nom d'algorithme. Les algorithmes, dont les racines remontent aux civilisations anciennes, ont connu des siècles d'évolution et jouent aujourd'hui un rôle essentiel dans l'informatique contemporaine. Des techniques telles que la division et la conquête sont utilisées dans leur création et leur efficacité est évaluée par des mesures telles que la notation big O. Les algorithmes peuvent être représentés de différentes manières, notamment sous forme de pseudocode, d'organigramme ou de langage de programmation. Pour les exécuter, ils sont traduits dans un langage compréhensible par les ordinateurs, la vitesse d'exécution étant influencée par le jeu d'instructions utilisé. En fonction de leur conception ou de leur paradigme de mise en œuvre, les algorithmes peuvent être classés différemment, et leur niveau d'efficacité peut grandement influer sur le temps de traitement. Dans des domaines tels que l'informatique et l'intelligence artificielle, la compréhension et l'application efficace des algorithmes sont vitales.
2. déduction. Inference, a mental process, entails forming conclusions from existing evidence and logical reasoning. It's an integral aspect of critical thinking and problem-solving, with wide-ranging applications in areas such as scientific investigation, literary analysis, and artificial intelligence. Various forms of inference exist, such as deductive, inductive, abductive, statistical, and causal, each with its distinctive method and purpose. For example, deductive inference focuses on reaching specific conclusions from broad principles, whereas inductive inference generates broad conclusions from specific instances. Conversely, abductive inference involves making informed assumptions based on accessible evidence, while statistical and causal inferences revolve around interpreting data to make conclusions about a group or to establish cause-and-effect connections. Nonetheless, the precision of inferences can be affected by biases, preconceived notions, and misinterpretations. Despite these potential obstacles, enhancing inference skills is achievable through consistent practice, critical thinking activities, and exposure to a variety of reading materials.
GPT-1 (Wikipedia)

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformateur architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a transformateur génératif pré-entraîné.

Generative Pre-trained Transformer 1 (GPT-1)
Auteur(s) original(aux)OpenAI
Version initialeJune 2018; il y a 5 ans (June 2018)
Référentiel
SuccesseurGPT-2
Type
LicenceMIT
Site webopenai.com/blog/language-unsupervised/ Modifier ceci sur Wikidata
Original GPT architecture

Up to that point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use of datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models; many languages (such as Swahili ou Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building. In contrast, a GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.

The use of a transformateur architecture, as opposed to previous techniques involving attention-augmented RNNs, provided GPT models with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".

" Retour à l'index des glossaires
Suivre les mises à jour
fr_FRFrançais