Home

MMbbld

MMbbld, short for Multi-Modal Balanced Binary Labeling Dataset, is a benchmark dataset proposed for evaluating multi-modal machine learning models on binary classification tasks. It is designed to assess how effectively models fuse information from diverse modalities while preserving balanced class representations across the dataset.

Dataset composition and modalities: Each example includes an image (RGB, 224 by 224), a short text caption,

Annotation and quality control: Labels are produced through a crowd-sourced workflow with multiple independent annotators per

Data creation and licensing: Content is assembled from publicly available sources and carefully scrubbed for privacy.

Usage and benchmarks: MMbbld serves as a testbed for multi-modal fusion approaches, including early fusion, late

and
an
optional
audio
clip
up
to
three
seconds.
Labels
are
binary,
indicating
positive
or
negative
class
membership
for
the
given
task.
The
dataset
is
constructed
to
ensure
near-equal
distribution
of
both
classes
across
all
modalities,
facilitating
balanced
learning
and
evaluation.
sample.
A
reconciliation
step
resolves
disagreements,
and
a
subset
is
re-annotated
for
reliability.
Reported
inter-annotator
agreement
typically
falls
in
the
moderate
to
substantial
range
on
standard
metrics.
Standardized
preprocessing
aligns
timestamps
and
feature
extraction
across
modalities.
The
release
uses
a
permissive
non-commercial
license,
with
restrictions
on
commercial
redistribution,
and
includes
terms
of
use
for
researchers.
fusion,
and
cross-modal
attention
mechanisms.
Baselines
compare
unimodal
models
against
multimodal
architectures
using
metrics
such
as
accuracy,
F1
score,
and
AUROC.
The
dataset
is
widely
used
in
education
and
research
to
illustrate
balanced
evaluation
and
reproducible
experiments,
with
reference
implementations
provided
for
PyTorch
and
TensorFlow.