Generative Pre-trained Transformer (GPT)

A Generative Pre-trained Transformer (GPT) is a class of large language models that uses a decoder-only transformer trained in two stages. First, the model is pretrained on a huge corpus of internet text with the simple objective of predicting the next token. Second, it is adapted to follow instructions through fine-tuning, supervised demonstrations, and reinforcement learning from human feedback.

OpenAI introduced the original GPT in 2018, scaled it through GPT-2 (2019), GPT-3 (2020), GPT-3.5 — which powered the first ChatGPT release in late 2022 — and the GPT-4 / GPT-4o / GPT-5 generations. Each step grew parameter counts, training data, and context windows, while methods improved reasoning, multimodality, and tool use.

The GPT recipe became the dominant template across the industry. Claude, Gemini, Llama, Mistral, DeepSeek, and Qwen are all decoder-only transformers trained with very similar objectives, even when the underlying weights and engineering differ.

In casual usage, “GPT” sometimes refers narrowly to OpenAI’s models and sometimes broadly to the entire decoder-only LLM family — the term is overloaded but unmistakably central to modern AI.

Sources

See also