Home

nonVCF

NonVCF is a term used in genomics to describe data representations that do not conform to the Variant Call Format (VCF). VCF is a widely adopted, text-based standard for describing small-scale genetic variants, but many data types and complex variants are not easily captured by VCF alone. NonVCF encompasses alternative formats and models used when VCF’s allele-centric encoding proves limiting or insufficient for the research question.

Examples of nonVCF data include raw alignment and read-level evidence (BAM/CRAM), assembly- or graph-based representations, and

Use cases for nonVCF approaches include complex structural variation, multi-allelic loci, pan-genome analyses, and graph-based references

Overall, nonVCF describes a family of data formats and models that complement VCF by addressing limitations

richer
descriptions
of
variation
such
as
structural
variants,
copy
number
changes,
and
haplotype
structure.
Formats
frequently
cited
in
discussions
of
nonVCF
include
Genome
Variation
Format
(GVF),
BED
and
BEDPE
for
genomic
intervals
and
breakpoints,
GFF/GTF
for
gene
models
and
features,
and
graph-based
formats
like
Graph
Alignment/Graphical
Fragment
Assembly
(GFA).
In
some
workflows,
nonVCF
data
is
stored
and
exchanged
using
JSON-based
representations
or
specialized
databases
that
preserve
contextual
metadata
not
easily
encoded
in
VCF.
where
variation
is
represented
as
paths
or
graphs
rather
than
a
single
reference
allele.
Advantages
include
richer
metadata
and
more
flexible
representations;
drawbacks
include
reduced
interoperability,
tool
support,
and
potential
information
loss
in
conversions
to
VCF.
in
representing
genomic
variation,
especially
for
complex,
non-allelic,
or
graph-based
scenarios.