Home

GPT

GPT stands for Generative Pre-trained Transformer, a family of autoregressive language models developed by OpenAI. Built on the Transformer architecture, GPT models are trained in two stages: pre-training on large corpora to learn language structure, followed by fine-tuning or instruction-following alignment to improve task performance. The models generate coherent text by predicting the next token in a sequence, using a context window that determines how much prior text influences output.

The first version, GPT, released in 2018, demonstrated the viability of unsupervised pre-training for language tasks.

Technical notes: GPT models are trained on diverse datasets drawn from publicly available text and licensed

Applications: chatbots, content generation, code completion, translation, summarization, research assistance, and more, typically via API access

Limitations and governance: GPT models can produce incorrect or biased outputs, known as hallucinations, and their

Successive
iterations—GPT-2
(2019),
GPT-3
(2020),
and
GPT-4
(2023)—scaled
model
size
and
data,
enabling
few-shot
and
zero-shot
learning,
in
which
tasks
are
performed
with
minimal
or
no
task-specific
training.
sources.
They
use
byte-pair
encoding
or
similar
tokenization.
GPT-3
and
later
employ
instruction
tuning
and
RLHF
to
improve
alignment
with
user
intent,
safety,
and
usefulness.
Some
versions
are
multimodal,
accepting
image
inputs
in
addition
to
text.
or
platform
integrations.
The
models
have
influenced
AI
tooling,
prompting
a
shift
toward
larger,
instruction-following
systems.
responses
depend
on
prompts.
They
may
reflect
biases
in
training
data.
OpenAI
and
others
implement
safety
filters,
usage
policies,
and
monitoring;
concerns
include
misinformation,
manipulation,
and
energy
use.
Access
to
the
weights
of
GPT
models
is
restricted;
earlier
open
releases
are
partial.