Home

Hadoopcompatible

Hadoop-compatible refers to software, storage, or data formats that can integrate with the Apache Hadoop ecosystem by implementing or exposing Hadoop-compatible interfaces. Such components can participate in Hadoop workloads, operate with Hadoop tooling, and exchange data via the Hadoop FileSystem API and related standards.

It can include storage systems that present a Hadoop-compatible file system (HCFS), enabling Hadoop clients to

Cloud-based storage and object stores may be marketed as Hadoop-compatible when they provide connectors or adapters

Compatibility is often version-specific and can be partial. Some features may not be fully supported, and performance,

See also: Apache Hadoop, HDFS, MapReduce, YARN, Hadoop ecosystem, Hadoop-compatible storage, Hadoop-compatible file system.

read
and
write
data
as
if
to
HDFS.
It
also
covers
processing
engines
and
runtimes
that
can
submit
tasks
to
YARN
or
run
MapReduce
jobs,
either
directly
or
through
compatible
APIs.
Data
formats
commonly
used
in
Hadoop
workflows,
such
as
Parquet,
ORC,
Avro,
and
SequenceFile,
are
typically
described
as
Hadoop-compatible
when
they
preserve
schema,
compression,
and
compatibility
across
components.
(for
example,
s3a
or
gcs-connector)
that
implement
the
Hadoop
FileSystem
interface.
Similarly,
non-traditional
compute
engines
may
claim
Hadoop
compatibility
if
they
can
read/write
Hadoop
data,
support
Hadoop
data
formats,
or
participate
in
the
Hadoop
scheduling
and
resource
management
model
(via
YARN
or
a
compatible
runtime).
security,
and
consistency
characteristics
may
differ
from
a
native
Hadoop
deployment.
Users
should
verify
compatibility
matrices
and
test
workloads
when
integrating
third-party
components.