Home

SLIs

An SLI, or service level indicator, is a carefully defined quantitative measure of a specific aspect of a service's quality. It is used to assess how well a system performs relative to defined expectations, typically from the perspective of users. SLIs are derived from telemetry data collected from real user traffic or synthetic tests and are chosen to be observable, measurable, and meaningful to stakeholders.

SLIs are often used within a broader reliability framework that includes service level objectives (SLOs) and

Common SLIs cover availability, latency, error rate, and saturation. Availability measures the proportion of time the

Practically, teams use SLIs to drive decisions through error budgets, alerts, and dashboards. An error budget

service
level
agreements
(SLAs).
An
SLO
is
a
target
value
or
range
for
one
or
more
SLIs,
such
as
“99.9%
availability”
or
“p95
latency
under
250
ms.”
An
SLA
is
a
formal
contract
with
consequences
if
targets
are
not
met,
usually
specifying
remedies
or
penalties.
While
SLIs
are
the
metrics,
SLOs
are
the
performance
goals,
and
SLAs
formalize
commitments
to
customers
or
partners.
service
functions
as
expected;
latency
tracks
response
times;
error
rate
monitors
the
share
of
failed
requests;
saturation
reflects
resource
pressure
or
throughput
limits.
Measurements
may
use
real-user
data,
synthetic
tests,
or
a
combination,
and
are
often
calculated
over
defined
windows
(for
example,
a
30-day
rolling
window
or
a
14-day
period).
expresses
the
permissible
level
of
unreliability
within
an
SLO
and
informs
release
and
incident-management
priorities.
Good
SLIs
are
tightly
aligned
with
customer
impact,
challenging
enough
to
indicate
meaningful
reliability,
and
supported
by
high-quality
instrumentation.
Common
pitfalls
include
choosing
misleading
metrics,
overloading
with
too
many
SLIs,
or
letting
data
quality
degrade.