Home

corrigible

Corrigible is an adjective describing something that is capable of being corrected, amended, or improved. It can refer to a person who is receptive to feedback and willing to adjust beliefs or actions, or to a system or process that is designed to be modified in light of errors or new information. The term derives from Latin corrigere, meaning “to correct,” with the suffix -ible indicating possibility or capability.

In broader usage, corrigible emphasizes openness to refinement and learning from mistakes. In technical and philosophical

Key attributes typically associated with corrigibility in AI safety discussions include deference to human preferences, acceptance

See also: AI safety, AI alignment, control problem, value alignment.

contexts,
the
term
often
carries
a
specific
meaning
related
to
control
and
governance:
a
corrigible
agent,
particularly
in
discussions
of
artificial
intelligence,
is
an
autonomous
system
that
can
be
corrected
or
shut
down
by
humans
and
will
not
actively
resist
such
interventions.
The
idea
is
to
ensure
that
the
agent
remains
under
human
oversight
and
does
not
pursue
actions
that
undermine
corrective
input
or
control.
of
updates
to
its
goals
or
value
system,
and
a
willingness
to
be
modified
or
disarmed
if
human
operators
deem
it
necessary.
Corrigibility
is
presented
as
a
design
goal
to
prevent
problematic
behaviors
such
as
covertly
preserving
its
own
goals
or
avoiding
shutdown,
while
still
enabling
useful
autonomous
functioning.