Home

DataNodes

DataNodes are the storage nodes in a distributed file system such as the Hadoop Distributed File System (HDFS). Each DataNode runs on a server with local storage and stores portions of the system's data blocks. DataNodes serve data to clients and to other DataNodes during replication, while the NameNode maintains the namespace and block mapping.

DataNodes store blocks on local disks and report storage status to the NameNode via heartbeats and block

DataNodes perform local integrity checks with checksums and store blocks according to the configured replication factor.

reports.
The
NameNode
uses
this
information
to
determine
block
locations,
enforce
replication,
and
detect
under-replication.
During
reads,
clients
obtain
block
locations
from
the
NameNode
and
fetch
data
from
the
appropriate
DataNodes.
Writes
proceed
through
a
pipeline
of
DataNodes
coordinated
by
the
NameNode;
data
is
written
to
the
first
node
and
streamed
to
replicas
to
satisfy
the
replication
policy.
If
a
DataNode
fails,
the
NameNode
marks
blocks
as
under-replicated
and
triggers
replication
to
other
DataNodes
to
preserve
redundancy.
Operational
considerations
include
capacity
planning,
network
bandwidth,
and
hardware
reliability.
DataNodes
typically
run
on
commodity
hardware
and
may
be
deployed
across
racks
to
improve
throughput
and
fault
tolerance.
They
are
a
core
component
of
HDFS,
distinct
from
the
NameNode,
which
handles
metadata
and
namespace
management.