Home

vcf

VCF stands for Variant Call Format, a text-based file format used to store genetic variation data identified in high-throughput sequencing experiments. It has become the de facto standard for representing variants such as single nucleotide polymorphisms (SNPs), small insertions and deletions (indels), and, with certain conventions, some larger structural variants.

A VCF file begins with a series of metadata lines that start with two hash marks (##), describing

VCF files may store multi-allelic variants, phasing information, and annotations. They are frequently compressed with bgzip

---

the
format,
the
reference
genome,
and
various
filters
or
annotations.
The
header
line
that
starts
with
a
single
hash
and
the
word
CHROM
defines
the
column
titles
for
the
data
rows.
The
standard
columns
are
CHROM,
POS,
ID,
REF,
ALT,
QUAL,
FILTER,
INFO,
FORMAT,
followed
by
one
or
more
sample
genotype
columns.
CHROM
is
the
chromosome
or
contig
name,
POS
is
a
1-based
coordinate,
and
REF
is
the
reference
allele.
ALT
holds
one
or
more
alternative
alleles,
separated
by
commas
for
multi-allelic
sites.
QUAL
is
the
Phred-scaled
quality
score,
FILTER
indicates
whether
the
variant
passed
predefined
filters,
and
INFO
carries
optional
key=value
fields
with
additional
attributes.
FORMAT
describes
the
genotype
fields
for
each
sample,
and
the
subsequent
sample
columns
provide
per-sample
data
such
as
genotype
(GT),
read
depth
(DP),
and
genotype
quality
(GQ).
and
indexed
with
tabix
to
enable
efficient
querying.
A
binary
version,
BCF,
exists
for
more
compact
storage.
VCF
is
widely
used
in
variant
discovery,
population
genetics,
and
downstream
analyses
with
tools
such
as
GATK,
bcftools,
and
vcftools.