Home

Microbatching

Microbatching refers to the practice of dividing a stream of data or a workload into small batches, smaller than a conventional large batch, and processing each batch as a discrete unit. It aims to balance the low latency of real-time processing with the throughput benefits of batch computing. Micro-batches are typically fixed in size or duration, and the system processes them in a staged pipeline.

In streaming systems, micro-batching is used to transform an unbounded data stream into a sequence of finite

In machine learning and deployment, micro-batching serves two related roles. For inference serving, requests are grouped

Advantages include lower latency than large-batch processing, better resource utilization, and compatibility with existing batch-oriented pipelines.

work
units.
Frameworks
such
as
Spark
Streaming
historically
implemented
discretized
streams
by
grouping
incoming
records
into
short
time
windows
(for
example,
one-second
micro-batches).
This
yields
predictable
latency
and
leverages
batch-oriented
operators,
but
introduces
scheduling
overhead
and
requires
fault-tolerance
mechanisms
that
cover
entire
micro-batches.
into
small
batches
to
exploit
hardware
acceleration
(GPUs/TPUs)
while
aiming
to
keep
latency
within
acceptable
bounds.
For
training,
micro-batch
stochastic
gradient
descent
splits
a
large
training
batch
into
smaller
micro-batches
that
are
processed
sequentially
or
pipelined
across
devices,
enabling
memory
efficiency
and
asynchronous
computation.
Drawbacks
include
added
complexity
in
scheduling
and
fault
tolerance,
potential
jitter
in
latency,
and
a
careful
choice
of
batch
size
to
avoid
inefficiencies.