predictiontime - Infinite Lexicon - Infinite Lexicon

predictiontime

Predictiontime, or inference latency, is the time from when input data becomes available to when a deployed model outputs its prediction. It is a key metric for the responsiveness of machine learning systems and is distinct from training time, which measures the model development process.

Predictiontime is usually reported as average per-instance latency and as tail metrics such as p95 or p99

Several factors influence predictiontime: model size and architecture; numerical precision; input size and data formatting; preprocessing

Low predictiontime is critical for real-time or interactive applications such as live recommendations, fraud detection, robotics,