Home

mBERT

mBERT, or multilingual BERT, is a pretrained language representation model based on the BERT architecture designed to work across multiple languages with a single model. It was released to enable cross-lingual natural language processing by leveraging a shared multilingual vocabulary and joint pretraining on data from many languages.

Architecture and training: mBERT uses the BERT-base configuration, consisting of 12 transformer layers, a hidden size

Cross-lingual capabilities: A key feature of mBERT is its potential for zero-shot cross-lingual transfer. Because of

Limitations and considerations: While mBERT enables cross-lingual transfer, performance varies by language and script. Low-resource languages

Impact and use: mBERT has become a widely used baseline for multilingual NLP and has influenced subsequent

of
768,
and
12
attention
heads.
It
is
trained
with
a
single
WordPiece
vocabulary
that
covers
104
languages,
containing
about
119,000
tokens.
The
training
objective
is
masked
language
modeling,
where
a
subset
of
input
tokens
is
replaced
with
a
mask
and
the
model
learns
to
predict
them.
The
model
is
typically
trained
on
multilingual
text
from
Wikipedia
and
other
sources,
enabling
a
shared
representation
space
across
languages.
the
shared
vocabulary
and
joint
multilingual
pretraining,
a
model
fine-tuned
on
a
task
in
one
language
can
often
perform
the
same
task
in
other
languages
without
language-specific
training
data.
It
has
been
applied
to
tasks
such
as
named
entity
recognition,
part-of-speech
tagging,
sentiment
analysis,
and
question
answering.
and
non-Latin
scripts
may
be
underrepresented
in
the
training
data,
affecting
accuracy.
The
approach
relies
on
a
large,
shared
vocabulary
and
does
not
include
explicit
cross-lingual
alignment
objectives,
and
it
requires
substantial
computational
resources
for
training
and
inference.
multilingual
models
and
research
into
cross-language
transfer,
serving
as
a
practical
starting
point
for
many
multilingual
tasks.