Home

ABFS

ABFS, short for Azure Blob File System, is a Hadoop-compatible file system driver that enables applications to access data stored in Microsoft Azure Blob Storage using the Hadoop FileSystem API. It provides a bridge between distributed processing frameworks such as Hadoop MapReduce and Apache Spark and Azure Storage, allowing scalable data processing without moving data into a separate shared filesystem.

The ABFS connector is part of the Hadoop ecosystem and is implemented as a FileSystem under the

Authentication is supported via multiple methods, including Azure Active Directory OAuth 2.0 (with service principals or

Use cases include data lakes on Azure, large-scale ETL pipelines, and analytics workloads that require direct

See also: Azure Blob Storage, Azure Data Lake Storage Gen2, Hadoop FileSystem, HDInsight, Apache Spark.

package
org.apache.hadoop.fs.azurebfs.
It
supports
two
URI
schemes:
abfs://
for
Azure
Blob
Storage
accounts
and
abfss://
for
ADLS
Gen2
storage
accounts
with
a
secure,
authentication-enabled
path.
managed
identities),
storage
account
keys,
and
Shared
Access
Signatures.
ABFS
can
be
configured
in
Hadoop
and
Spark
clusters
on
Azure
HDInsight,
Azure
Databricks,
and
other
environments
that
run
Hadoop-compatible
software.
access
to
data
in
Blob
Storage
without
copying
to
HDFS.
Performance
and
semantics
vary
with
storage
tier
and
account
type;
ABFS
emphasizes
compatibility
and
scalability
with
cloud
storage
rather
than
full
POSIX-like
filesystem
behavior.