Home

TPUs

Tensor Processing Units (TPUs) are application-specific integrated circuits (ASICs) designed by Google to accelerate machine learning workloads, particularly neural networks. They are built to deliver high throughput for large-scale tensor operations and are tightly integrated with Google's software stack, including TensorFlow and related tooling. TPUs are most commonly exposed as cloud accelerators through Google Cloud Platform as Cloud TPU devices.

Hardware and architecture summaries typically highlight a large array of matrix-multiply processing units, a design that

Generations and usage have evolved from inference-focused accelerators to systems capable of training at scale. Early

Software ecosystem and workflow involve TensorFlow, XLA, and JAX, with compilers and runtimes that map neural

Edge TPU, a separate line, targets on-device inference in low-power environments and supports TensorFlow Lite for

favors
dense
linear
algebra
used
in
neural
networks.
TPUs
emphasize
high
compute
throughput
and
memory
bandwidth,
often
employing
low-precision
numeric
formats
such
as
bfloat16
and
int8
to
improve
performance
and
efficiency
while
maintaining
model
accuracy
for
many
workloads.
The
devices
are
paired
with
on-chip
or
high-bandwidth
off-chip
memory
and
are
connected
via
specialized
interconnects
to
enable
scalable
performance
across
multiple
chips
in
clusters
known
as
TPU
Pods.
generations
provided
optimized
inference,
while
later
versions
added
training
support
and
larger-scale
deployment
options.
TPU
Pods
enable
distributed
training
by
connecting
many
TPU
devices,
enabling
substantial
acceleration
for
large
models
and
extensive
datasets.
Access
is
primarily
through
Google
Cloud,
with
tooling
to
manage
data,
model
compilation,
and
distributed
execution
across
devices.
network
graphs
to
TPU
hardware.
Developers
prepare
data,
compile
models
for
the
TPU,
and
run
training
or
inference
jobs
in
the
Cloud
TPU
environment,
often
leveraging
parallelism
across
multiple
devices.
edge
applications.