tfdistribute
tf.distribute is a TensorFlow module that provides distributed training APIs to scale machine learning workloads across multiple devices and machines. It defines a family of distribution strategies that enable data parallelism and, in some cases, model parallelism, while handling variable synchronization and input distribution in a unified way.
Key strategies include OneDeviceStrategy, MirroredStrategy, MultiWorkerMirroredStrategy, and TPUStrategy. OneDeviceStrategy runs on a single device and is
Usage typically involves creating a strategy and entering strategy.scope() to build your model and related assets.
tf.distribute is designed to work with the TensorFlow 2.x eager execution model and integrates with Keras, tf.data,
Historically, tf.distribute was introduced to unify distributed training in TensorFlow 2.x, building on earlier experimental APIs