Home

OpenLineage

OpenLineage is an open standard and ecosystem for data lineage. It provides a platform-agnostic contract for capturing, transmitting, and consuming lineage metadata across data platforms and tools, enabling observability of data pipelines.

The core of OpenLineage is a common data model and event schema that describes how datasets are

OpenLineage defines a specification and reference implementations to emit and consume lineage events. Producers (emitters) capture

The project is community-driven and aims to promote interoperability and reduce vendor lock-in in data governance

OpenLineage is related to data governance, metadata management, and observability in data ecosystems. It complements other

produced,
consumed,
and
transformed
by
jobs.
Key
concepts
include
datasets,
jobs,
job
runs,
and
lineage
relationships
that
link
input
datasets
to
output
datasets
along
with
metadata
such
as
timestamps,
run
state,
and
properties.
events
from
data
orchestration
tools,
ETL
jobs,
and
platforms,
while
consumers
(datastores,
catalogs,
lineage
dashboards)
ingest
and
visualize
the
lineage.
The
events
are
typically
serialized
as
JSON
payloads
following
the
OpenLineage
schema,
enabling
interoperability
across
systems.
and
compliance
workflows.
It
has
been
adopted
by
various
data
platforms
and
orchestration
tools
and
is
compatible
with
popular
environments
for
data
engineering.
metadata
standards
and
integrations
with
data
catalogs,
lineage
visualization
tools,
and
pipeline
schedulers.