Home

LRS2LRS3

LRS2 and LRS3, collectively referred to as the Lip Reading Sentences series, are large-scale public datasets created to support research in visual speech recognition and lip reading. They provide video data of people speaking along with aligned textual transcriptions, enabling the development and evaluation of models that interpret speech from visual input.

LRS2 (Lip Reading Sentences 2) is built from publicly available video sources such as broadcast programs. The

LRS3 (Lip Reading Sentences 3) extends the scope of LRS2 by increasing data diversity and scale. It

Both datasets are used to train and evaluate audiovisual or visual-only speech recognition systems. Evaluation commonly

See also related lip-reading datasets and benchmarks in visual speech recognition literature.

dataset
comprises
numerous
clips
where
each
excerpt
contains
a
speaker’s
face
and
mouth
region,
accompanied
by
a
corresponding
transcription.
The
data
are
organized
with
time-aligned
transcripts
and
commonly
include
metadata
about
speakers
and
video
conditions.
LRS2
is
widely
used
as
a
benchmark
for
training
end-to-end
models
that
map
visual
mouth
movements
to
text
and
for
comparing
lip-reading
approaches
across
methods.
incorporates
a
broader
range
of
speakers,
speaking
styles,
and
recording
conditions,
often
drawing
from
additional
sources
such
as
online
videos
and
talks.
This
expansion
supports
learning
more
robust
and
generalizable
visual
speech
representations
and
typically
provides
longer
utterances
and
more
varied
linguistic
content
than
its
predecessor.
relies
on
word
error
rate
or
character
error
rate
to
measure
how
accurately
a
model
can
reconstruct
spoken
content
from
visual
input.
Access
to
these
datasets
is
provided
under
research-use
licenses
and
is
intended
to
promote
reproducibility
and
progress
in
the
field.