Home

dvccache

dvccache refers to the data cache used by DVC (Data Version Control). It is a content-addressable storage system that stores data artifacts and pipeline outputs to speed up operations, reduce redundant downloads or computations, and enable efficient sharing of data across projects and environments.

The dvccache is typically located in a project’s .dvc/cache directory by default, but its location can be

When data is added to a DVC project (for example, with dvc add), DVC computes the content

Cache management commands such as dvc gc prune unused cache entries help control disk usage. The dvccache

configured
through
DVC_CACHE_DIR
or
related
settings.
The
cache
is
designed
to
be
persistent
and
shareable
across
workspaces;
multiple
projects
can
point
to
the
same
cache
location
if
they
are
configured
to
do
so.
Cache
contents
are
identified
by
a
hash
of
the
data,
usually
a
SHA-256
digest,
ensuring
that
identical
content
is
stored
only
once.
To
prevent
directory
bloat,
the
cache
uses
a
two-level
layout
where
the
first
two
characters
of
the
hash
form
a
subdirectory,
and
the
remaining
characters
form
the
file
name.
hash
and
stores
the
corresponding
file
in
the
dvccache
if
it
is
not
already
present.
The
working
directory
then
contains
lightweight
pointers
or
links
to
the
cached
content,
rather
than
duplicating
the
data.
DVC
can
retrieve
data
from
the
cache
or
from
remote
caches
(for
example,
S3,
GCS,
Azure,
SSH,
or
HDFS)
as
needed.
Cache
behavior,
including
linking
method
(symlink,
hardlink,
reflink,
or
copy)
and
remote
cache
configuration,
can
be
customized
to
suit
the
operating
system
and
performance
goals.
is
central
to
DVC’s
ability
to
reproduce
experiments,
share
datasets,
and
optimize
data-intensive
workflows,
while
requiring
careful
monitoring
of
storage
requirements
and
cache
health.