horovodrun
horovodrun is a command-line launcher used to start distributed training jobs with Horovod across multiple processes and hosts. It serves as a front end that uses an MPI-compatible launcher to spawn Horovod-enabled training processes on the specified machines, with each process executing a copy of the user’s training script. The launcher coordinates the ranks and initializes the distributed environment so that collective operations such as allreduce can be performed efficiently.
To use horovodrun, an MPI implementation (such as Open MPI or MPICH) must be available on the
horovodrun supports GPU and multi-host configurations by distributing processes across available devices according to the host
In practice, horovodrun is a primary method for launching Horovod jobs in cluster environments and is compatible