Home

Oozie

Apache Oozie is an open-source workflow scheduler system designed to manage Hadoop job orchestration. It runs as a Java web application with a REST API and a web UI, and it coordinates the execution of various Hadoop jobs across a cluster. Oozie stores workflow definitions and job state in a relational database and uses the cluster's resource manager to launch individual tasks.

Core concepts include Workflow Jobs, Coordinators, and Bundles. A Workflow Job defines a directed acyclic graph

Coordinator Jobs enable time- and data-driven scheduling. They use datasets to describe input data availability or

Oozie is designed to work with the Hadoop ecosystem, submitting actions to the ResourceManager and using Hadoop-compatible

First released as an Apache project in the late 2000s, Oozie remains a core tool for workflow

of
actions
in
an
XML
file.
Actions
can
be
Hadoop
MapReduce,
Pig,
Hive,
Sqoop,
Spark,
Java
programs,
streaming,
or
DistCp,
and
may
include
sub-workflows.
Control
flow
elements
such
as
start,
end,
decision,
fork,
and
join
determine
the
execution
path.
Workflows
support
retries,
notifications,
and
basic
error
handling.
partitioning
and
specify
frequency,
start/end
times,
and
expiration.
When
their
conditions
are
met,
they
trigger
associated
Workflow
Jobs.
Bundles
provide
a
higher-level
abstraction
to
group
multiple
coordinators
and
workflows
for
unified
management.
file
systems
for
storage.
It
supports
multiple
runtimes
such
as
Hadoop
MapReduce,
Tez,
Spark,
and
streaming
engines
through
action
types,
and
it
can
integrate
with
external
systems
for
notifications
and
logging.
The
server
can
be
deployed
on
a
cluster
with
high
availability.
orchestration
in
many
Hadoop
deployments,
particularly
where
centralized
scheduling
and
reproducible
pipelines
are
required.
It
remains
in
use
where
Hadoop-native
workflows
are
common.