Home

Deepdive

DeepDive is an open-source framework for constructing knowledge bases from messy, heterogeneous data sources. It is designed to help researchers and developers extract, integrate, and reason about information with uncertainty. The system lets users specify declarative rules and templates that describe how to identify entities, relationships, and attributes in the input data, how to join and reconcile competing evidence, and how to resolve ambiguities across sources. Based on these specifications, DeepDive performs probabilistic inference to generate a set of facts, each with an associated confidence score.

DeepDive operates on a relational data model and uses a pipeline approach that combines data extraction, feature

Development of DeepDive originated at Stanford University and has been released as an open-source project. It

generation,
and
data
fusion.
Users
provide
feature
templates
and
labeling
information,
and
the
system
learns
to
weight
evidence
from
multiple
sources
to
improve
accuracy.
The
output
is
typically
a
knowledge
base
populated
with
facts
about
entities
and
their
relations,
suitable
for
querying
and
downstream
analytics.
The
framework
emphasizes
handling
uncertainty
inherent
in
real-world
data
and
supports
manual
inspection
and
refinement
of
results.
has
been
deployed
in
a
variety
of
domains,
including
life
sciences,
journalism,
and
government
analytics,
as
a
means
of
building
scalable
knowledge
bases
from
large
document
collections
and
databases.
The
project
has
influenced
subsequent
work
in
probabilistic
data
extraction
and
knowledge-base
construction,
contributing
to
the
broader
field
of
data-driven
information
extraction.