Home

TorchServe

TorchServe is an open-source model serving framework for PyTorch designed to simplify deploying trained PyTorch models for production-scale inference. It enables multi-model serving, model versioning, dynamic batching, and scalable deployment with minimal additional code. The project is part of the PyTorch ecosystem and maintained by the PyTorch community with contributions from industry and research teams, originally developed by AWS in collaboration with Meta (Facebook) and the broader ecosystem. TorchServe is released under the Apache 2.0 license.

Core features include a model-archiver tool to package models into MAR files that bundle the serialized model,

Deployment can be performed in containers or on Kubernetes, enabling easy integration into existing ML pipelines

a
handler
(for
pre-processing,
post-processing,
and
inference),
and
metadata.
It
supports
pre-built
default
handlers
for
common
tasks
(image
classification,
object
detection,
text
classification)
as
well
as
user-defined
custom
handlers.
Inference
is
exposed
via
RESTful
HTTP
and
gRPC
APIs,
and
metrics
can
be
exported
in
Prometheus
format
for
monitoring.
TorchServe
supports
dynamic
batching
to
improve
throughput
and
routes
requests
to
a
pool
of
model
workers,
with
model
versioning
and
hot-swapping
for
zero-downtime
updates.
and
cloud
environments.
The
framework
emphasizes
ease
of
use,
with
commands
to
package
models,
start
the
server,
and
configure
services
and
endpoints.