loadlimiting
Loadlimiting refers to techniques that keep a system from becoming overloaded by actively restricting incoming work as demand grows toward capacity. The goal is to maintain acceptable latency and error rates by capping, delaying, or prioritizing work before queues become saturated.
Common methods include rate limiting (such as token bucket or leaky bucket), admission control to reject or
Effective load limiting relies on monitoring state and performance targets. Thresholds should reflect service level objectives
Applications include web APIs, cloud services, streaming pipelines, and databases, where load limiting preserves responsiveness during
See also: rate limiting, backpressure, circuit breaker, load shedding.