Home

groupby

Groupby, often written as group by in SQL and as groupby in programming libraries, is a data processing operation that partitions a dataset into subgroups based on one or more keys and then applies a calculation to each subgroup. The result is a table of summary values, with one row per group, illustrating metrics such as totals, means, or counts across defined categories.

In SQL, GROUP BY groups rows by the specified expressions. Each distinct value (or combination) of the

In data analysis libraries, groupby constructs manage the grouping logic and support subsequent operations such as

Common aggregations include sum, mean, median, count, and standard deviation. Grouping is used for categorical, temporal,

Related concepts include windowed operations that compute values within a group across a sequence, and the

grouping
expressions
defines
a
group,
and
aggregate
functions
such
as
SUM,
AVG,
COUNT,
MIN,
and
MAX
are
computed
for
each
group.
The
result
typically
includes
only
the
grouped
columns
and
the
aggregates,
unless
a
HAVING
clause
filters
groups
after
aggregation.
aggregation,
transformation,
or
custom
functions.
Examples
include
Python’s
pandas
(df.groupby(keys).agg(...),
df.groupby(keys).transform(...)),
R’s
dplyr
(group_by(...)
followed
by
summarise
or
mutate),
and
data.table
(by=).
Grouping
can
be
performed
on
one
or
multiple
keys
and
can
operate
on
numeric,
string,
or
date/time
data.
or
derived
keys,
enabling
cross-group
comparisons
and
subsetting.
Performance
depends
on
data
size,
key
cardinality,
and
memory,
with
implementations
often
using
hash-based
or
sort-based
strategies
and,
for
very
large
datasets,
streaming
or
chunked
processing.
distinction
between
aggregations
(reducing
to
one
value
per
group)
and
transforms
(producing
a
value
per
row
within
its
group).