Home

pseudosamples

Pseudosamples are artificial datasets that imitate the properties of a population or observed data but are not actual new observations. They are created through resampling or generation based on a model or the original data, with the goal of enabling statistical inference, model assessment, or uncertainty quantification without requiring additional real-world data.

Common forms include bootstrap samples, created by resampling with replacement from an observed dataset; jackknife samples,

Pseudosamples are used to estimate sampling distributions, construct confidence intervals, assess estimator bias, tune models, and

Limitations include reliance on assumptions about the data-generating process, potential propagation of biases from the original

History: The bootstrap, a principal source of pseudosamples, was introduced by Bradley Efron in 1979, building

constructed
by
systematically
leaving
out
one
or
more
observations;
and
cross-validation
partitions,
which
split
data
into
training
and
validation
pseudo-samples.
In
time-series
or
dependent
data,
methods
such
as
block
bootstrap
or
moving-block
bootstrap
are
used
to
preserve
dependence
structure.
In
predictive
modeling,
synthetic,
or
pseudo-samples,
may
be
generated
by
fitting
a
model
to
data
and
drawing
new
observations
from
the
fitted
distribution,
or
by
techniques
such
as
SMOTE
or
generative
models
to
augment
imbalanced
datasets.
evaluate
predictive
performance.
They
provide
a
practical
means
of
uncertainty
quantification
when
true
sampling
from
the
population
is
impractical
or
expensive.
data,
and
the
risk
of
overfitting
or
optimistic
evaluation
if
pseudo-samples
are
not
properly
independent
or
if
leakage
occurs.
They
are
not
substitutes
for
real,
external
validation
data
in
many
contexts.
on
earlier
resampling
ideas
such
as
the
jackknife
developed
by
Quenouille
and
popularized
by
Tukey.