checkpointing - Infinite Lexicon - Infinite Lexicon

checkpointing

Checkpointing is a fault-tolerance technique used to save the state of a computation or system at predefined points so that execution can be resumed from that point after a failure or interruption. It is widely used in high-performance computing, long-running scientific simulations, databases, operating systems, and embedded systems to avoid complete recomputation and to shorten recovery times following outages.

A checkpoint typically captures a consistent view of the program or process, including memory contents, processor

During recovery, the system restores from the last checkpoint and resumes execution. In distributed settings, recovery

Common techniques include synchronous (blocking) and asynchronous (background) checkpointing, multi-level checkpointing that uses faster memory-resident images

a

synchronization

a

recoverability.