Home

DGEList

DGEList is a data container used in the edgeR package of Bioconductor to store and organize count-based gene expression data for differential expression analysis. It is designed to hold the essential components of digital gene expression experiments, such as RNA-seq, and to be the input for downstream normalization, dispersion estimation, and statistical testing.

A DGEList object is a list-like S3 class that typically includes several components. The counts element is

Creation of a DGEList typically involves the DGEList() constructor, with counts and optional genes or group

Users can subset or modify a DGEList using standard R operations and dedicated accessor functions to retrieve

a
matrix
of
integer
read
counts
with
genes
as
rows
and
samples
as
columns.
The
genes
element
is
an
optional
data
frame
containing
gene
annotations,
such
as
gene
identifiers
and
symbols.
The
samples
element
is
an
optional
data
frame
with
sample
metadata,
including
group
assignment,
batch
information,
and
other
factors.
Additional
fields
commonly
stored
in
a
DGEList
are
lib.size,
a
numeric
vector
of
library
sizes
for
each
sample,
and
norm.factors,
a
numeric
vector
of
normalization
factors
produced
by
methods
such
as
TMM
normalization.
The
group
information
is
often
stored
within
the
samples
data
frame
and
is
used
to
define
experimental
comparisons.
information
supplied
by
the
user.
Once
created,
the
object
serves
as
input
to
core
edgeR
workflows,
including
normalization,
dispersion
estimation,
and
fitting
generalized
linear
models
for
differential
expression
testing.
The
design
of
the
object
emphasizes
interoperability
with
edgeR
functions,
which
operate
on
DGEList
objects
and
update
its
components
as
analyses
proceed.
or
update
components
such
as
counts,
genes,
samples,
lib.size,
and
norm.factors.
The
class
provides
a
structured,
consolidated
way
to
manage
the
data
and
annotations
required
for
rigorous
differential
expression
analysis
within
edgeR.