Home

HTCondor

HTCondor is an open-source workload management system designed to manage large-scale compute tasks across clusters and distributed resources. It is used for high-throughput computing, where many jobs with varying runtimes are submitted and executed opportunistically on idle resources. HTCondor supports heterogeneous environments, checkpointing, and fault tolerance, making it suitable for scientific computing, data analysis, and academic research workflows.

Core components include the central manager, which hosts the collector and negotiator, and coordinates resource advertisements

HTCondor provides user-facing tools such as condor_q, condor_status, condor_rm, and condor_submit for job management, monitoring, and

from
execute
hosts
running
the
startd
daemon.
The
submit
host
runs
the
schedd,
which
accepts
user
job
submissions
via
condor_submit,
holds
them
in
a
queue,
and
negotiates
with
the
central
manager
to
start
jobs
on
suitable
machines.
Execute
hosts
publish
their
state
and
capabilities
through
startd;
the
condor_master
on
each
machine
supervises
the
daemons.
DAGMan
is
a
separate
tool
that
coordinates
workflows
as
directed
acyclic
graphs
of
jobs.
The
system
uses
a
set
of
universes
(vanilla,
standard,
parallel,
and
local)
to
define
execution
environments
and
supports
backfilling
and
preemption
to
optimize
utilization.
control.
It
supports
various
authentication
mechanisms,
policy-based
scheduling,
and
can
operate
within
single
sites
or
across
organizations,
including
integration
with
grid
middleware
and
GlideinWMS
for
dynamic
resource
provisioning.
Originating
in
the
late
1980s
at
the
University
of
Wisconsin–Madison,
HTCondor
has
evolved
into
a
widely
used
system
for
high-throughput
computing
in
research
and
education
settings.