changediffer
changediffer is an open‑source Python library that implements efficient differential data processing algorithms. It was first released in 2019 by the computational research group at Technische Universität München and has since gained popularity in data engineering and scientific computing projects that require incremental update of large datasets. The core functionality of changediffer is to compute the difference between two data sets or data frames and to apply transformations only to the changed portions, thereby reducing I/O and computational overhead. It supports common data structures such as pandas DataFrames, NumPy arrays and PySpark DataFrames, and provides an API that integrates with existing data pipelines.
changediffer’s algorithmic foundation is built on hash‑based change detection and immutable data structures. During a diff
The project is distributed under the MIT license and is maintained on GitHub, where contributors submit pull