Home

TFRecord

TFRecord is a binary file format used by TensorFlow to store a sequence of records. Each record is a serialized protocol buffer message, usually a tf.train.Example. An Example contains a Features map that associates feature names with lists of values. The value types supported are int64_list, float_list, and bytes_list, allowing a compact representation of heterogeneous data such as images, labels, and metadata.

TFRecord files are designed for efficient storage and streaming of large datasets. Being binary, they are more

Common workflows involve converting raw data to tf.train.Example messages and writing them with TFRecordWriter, then reading

Advantages of TFRecord include a compact binary representation and compatibility with efficient data pipelines on large-scale

space-efficient
and
faster
to
parse
than
text
formats
like
CSV.
They
integrate
naturally
with
the
TensorFlow
data
input
pipeline,
where
files
are
read
by
tf.data.TFRecordDataset,
optionally
using
compression
(NONE,
ZLIB,
or
GZIP).
Each
record
is
parsed
into
tensors
using
parsing
functions
such
as
tf.io.parse_single_example
or
tf.io.parse_example.
the
files
in
a
pipeline,
applying
mapping/parsing
logic,
and
batching
and
prefetching
for
training.
For
example,
image
data
is
often
stored
as
a
bytes_list
feature
containing
the
encoded
image
bytes,
with
accompanying
numeric
features
for
height,
width,
and
label.
datasets.
Limitations
include
the
lack
of
self-description
within
the
file;
the
schema
must
be
defined
and
maintained
by
the
user,
and
version
and
compatibility
considerations
can
arise
when
changing
feature
schemas.