Home

endofsentence

Endofsentence, or end-of-sentence boundary (EOS), refers to the point at which a sentence terminates in written or spoken language. In most writing systems, this boundary is signaled by a sentence-ending punctuation mark such as a period (full stop), a question mark, or an exclamation point. Some languages use additional or alternative marks, including inverted punctuation in Spanish and the use of script-specific marks in East Asian languages (for example, the Chinese period "。" and the Japanese "。"). In many scripts, capitalization of the following word also helps signal a boundary.

In writing, end-of-sentence boundaries aid readability, rhythm, and prosody, providing cues for where one sentence ends

Representations for EOS in digital systems include literal punctuation, newline characters, or explicit end-of-sentence tokens in

and
the
next
begins.
In
natural
language
processing
and
computational
linguistics,
end-of-sentence
detection
(or
sentence
boundary
detection)
is
a
foundational
preprocessing
step
that
determines
sentence
boundaries
for
tokenization,
parsing,
and
model
training.
Practical
systems
must
handle
edge
cases
such
as
abbreviations
(Dr.,
Inc.),
decimal
numbers,
ellipses,
and
parentheses,
which
can
obscure
boundaries
and
lead
to
segmentation
errors.
machine
learning
data.
Unicode
provides
standardized
approaches
to
sentence
breaks
through
text
segmentation
algorithms,
which
vary
by
language
and
context.
Endofsentence
thus
spans
typography,
readability,
and
computational
text
processing,
reflecting
how
humans
and
machines
partition
continuous
text
into
discrete
sentences.