Home

OmittedVariableBias

Omitted variable bias is a bias that arises in statistical estimation when a model leaves out one or more relevant variables that affect the dependent variable and are correlated with the included regressors. This can lead to biased and inconsistent estimates of the coefficients for the included variables, making it difficult to draw valid conclusions about causal relationships.

In the context of linear regression, suppose the true model is Y = β0 + β1 X1 + β2

OVB commonly arises in observational studies where random assignment is not present. Examples include estimating the

Remedies focus on improving model specification and identification strategies. These include adding relevant covariates, using fixed

X2
+
ε.
If
X2
is
omitted
and
is
correlated
with
X1
(Cov(X1,
X2)
≠
0)
and
X2
also
affects
Y
(β2
≠
0),
the
OLS
estimate
of
β1
is
biased.
In
the
simple
case,
the
asymptotic
bias
in
β1_hat
can
be
expressed
as
β2
times
the
influence
of
X2
on
X1,
often
written
as
β2
×
Cov(X1,
X2)
/
Var(X1).
More
generally,
omitting
relevant
variables
that
influence
Y
induces
bias
in
the
estimated
effects
of
the
included
variables.
effect
of
education
on
earnings
without
accounting
for
ability,
or
assessing
class
size
effects
without
controlling
for
teacher
quality
or
family
background.
The
bias
can
lead
to
overstated
or
understated
conclusions
about
causal
effects.
effects
or
panel
data
to
control
for
unobserved
heterogeneity,
applying
instrumental
variables,
employing
natural
experiments,
randomized
experiments,
or
propensity
score
methods.
It
is
important
to
recognize
that
multicollinearity
and
model
misspecification
can
complicate
interpretation,
but
they
are
distinct
from
omitted
variable
bias.