Home

wordbreak

Wordbreak refers to the concept of where a line or a string can be divided into words, and how those divisions are determined in text processing, layout, and computation. The idea encompasses both how text is physically wrapped on a page or screen and how language processing systems segment text into meaningful units.

In typography and text layout, word breaking governs line breaks and hyphenation. Most languages that use spaces

In natural language processing, word segmentation or tokenization is a related challenge, especially for languages that

In web design and computing, the CSS word-break property controls how lines may be broken within words.

In computer science, the Word Break problem asks whether a string can be segmented into a sequence

break
lines
between
words,
while
East
Asian
languages
often
require
rules
to
determine
whether
a
break
occurs
between
characters
or
within
a
word.
Hyphenation
and
punctuation
also
influence
where
breaks
can
occur,
and
different
typesetting
systems
apply
language-
and
script-specific
conventions
to
minimize
awkward
ragged
edges
and
preserve
readability.
do
not
use
explicit
word
boundaries.
For
example,
Chinese,
Japanese,
and
Thai
text
often
requires
algorithms
that
combine
dictionaries,
statistical
models,
and
contextual
clues
to
determine
correct
word
boundaries.
Effective
word
breaking
is
foundational
for
tasks
such
as
search
indexing,
machine
translation,
and
information
extraction.
Values
commonly
used
are
normal,
break-all,
and
keep-all,
with
behavior
varying
by
language
and
browser.
A
related
property,
overflow-wrap
(formerly
word-wrap),
can
force
breaking
of
long
words
or
URLs
to
prevent
overflow.
Together,
these
tools
influence
responsive
design
and
text
accessibility.
of
dictionary
words.
It
is
typically
addressed
with
dynamic
programming
and
has
variants
that
require
enumerating
all
valid
segmentations.