Home

PrestoDB

PrestoDB is an open-source distributed SQL query engine designed for running fast analytic queries across large data sets. It enables interactive, ad-hoc analysis by executing SQL queries on data that resides in multiple sources without requiring data movement. Built for a massively parallel processing (MPP) architecture, PrestoDB distributes query planning and execution across a cluster of worker nodes coordinated by a central planner to deliver low-latency results on big data workloads.

Origin and governance

Presto was developed at Facebook (now Meta) and released as open source in 2013. It formed part

Architecture and features

PrestoDB follows a master–worker model consisting of a coordinator (the planner) and multiple workers (the executors).

Data sources and connectors

A core strength of PrestoDB is its pluggable connectors that allow querying data in various storage systems

Usage and ecosystem

PrestoDB has been adopted by numerous organizations to enable fast, interactive analytics over data lake and

of
the
Apache
Software
Foundation
as
the
Apache
Presto
project,
with
PrestoDB
commonly
used
to
describe
the
Apache-licensed
implementation.
In
the
late
2010s,
the
Presto
community
split
into
two
major
lines:
the
Apache
Presto
project
(often
referred
to
in
practice
as
PrestoDB)
and
PrestoSQL,
a
separate
community-driven
fork
that
later
rebranded
as
Trino.
The
two
projects
have
continued
to
evolve
independently,
with
separate
release
cycles
and
feature
trajectories.
It
supports
ANSI
SQL
and
a
range
of
analytical
features,
including
complex
joins,
aggregations,
and
window
functions,
with
performance
optimized
for
large-scale
queries.
The
system
uses
a
catalog
and
connector
framework
to
access
data
from
diverse
sources,
enabling
queries
across
data
lakes
(such
as
HDFS
or
object
stores
like
S3
and
GCS),
relational
databases
(via
connectors),
and
NoSQL
stores.
without
moving
data.
Connectors
exist
for
Hive/HDFS,
cloud
storage,
relational
databases,
and
several
NoSQL
and
object
stores,
enabling
federated
analytics
across
heterogeneous
data
environments.
warehouse
ecosystems.
It
remains
a
foundational
technology
in
the
broader
Presto
ecosystem,
alongside
forks
and
successors
such
as
Trino,
which
offer
alternative
features
and
development
paths.