Home

Rencodings

Rencodings refer to the process of converting text data from one character encoding to another. In computing, this typically involves interpreting a sequence of bytes using a source encoding to obtain a sequence of Unicode code points, and then encoding those code points into a target encoding. Rencoding is commonly used when data moves between systems, applications, or standards that rely on different encodings, such as migrating legacy data to Unicode or delivering content over the web in a standard encoding like UTF-8.

Although the steps are straightforward in principle, rencodings can fail or introduce errors if the source

Common problems include mislabelled encodings, which can produce mojibake when the data is interpreted with the

Rencodings are supported by software libraries and tools such as iconv, Python's codecs, Java's Charset, and

See also character encoding, Unicode, transcoding, mojibake, and text encoding standards.

encoding
is
misidentified
or
if
characters
in
the
source
cannot
be
represented
in
the
target.
Many
workflows
allow
error
handling
modes
such
as
strict
(report
an
error),
replace
(substitute
an
unavailable
character),
or
ignore.
Using
Unicode
as
an
intermediate
representation
helps
preserve
information
when
possible.
wrong
encoding.
The
presence
or
absence
of
a
Byte
Order
Mark
(BOM)
can
affect
decoding
for
some
encodings.
Detecting
the
correct
source
encoding
is
an
area
of
study
and
may
rely
on
heuristics,
metadata,
or
byte
patterns
rather
than
conclusive
proof.
many
text
editors.
A
typical
workflow
is
to
decode
bytes
from
the
source
encoding
to
Unicode,
then
encode
to
the
target
encoding,
optionally
normalizing
with
Unicode
normalization
forms.