Home

Scanpy

Scanpy is an open-source Python toolkit for the analysis of single-cell transcriptomics data. It is designed to handle large-scale scRNA-seq experiments and is centered on the AnnData data structure, which stores the expression matrix (cells by genes) along with per-cell and per-gene metadata and analysis results.

The library provides an end-to-end, modular workflow for common tasks in single-cell analysis. Typical preprocessing steps

Scanpy emphasizes scalability through efficient, sparse-matrix representations and integrates with the broader Python scientific stack, including

include
normalization
to
account
for
sequencing
depth,
log
transformation,
identification
of
highly
variable
genes,
and
scaling.
Core
analyses
cover
dimensionality
reduction
(principal
component
analysis),
construction
of
a
shared
nearest-neighbor
graph,
clustering
(notably
Leiden
and
Louvain
methods),
and
visualization
using
nonlinear
embeddings
such
as
UMAP
or
t-SNE.
Scanpy
also
supports
differential
expression
testing
between
clusters
or
conditions
and
can
perform
trajectory
or
pseudotime
analysis
through
diffusion-based
methods.
For
RNA
velocity
analyses,
Scanpy
can
be
used
in
conjunction
with
dedicated
tools
such
as
scvelo.
NumPy,
SciPy,
scikit-learn,
and
common
plotting
libraries.
Data
are
stored
in
AnnData
objects,
typically
saved
in
.h5ad
files,
enabling
reproducible
and
shareable
workflows.
The
project
is
open-source
and
widely
adopted
in
academia,
with
a
community
of
contributors
and
extensive
documentation
and
tutorials.
It
interoperates
with
complementary
tools
for
data
integration
and
batch
correction,
and
is
often
used
alongside
other
Python-based
or
R-based
single-cell
analysis
resources
to
build
comprehensive
analysis
pipelines.