General workflow

VCF.Filter generates variant hiltlists from next-generation sequencing data. Filters are applied to textual and numerical custom annotations provided in VCF (variant call format) files.

VCF format primer

Although VCF files are text files that can be opened and manipulated with a text editor, the complexity of the VCF file format is often under estimated.

Consider the example from the VCFv4.2 format specification document shown below.

There are two main parts:
1. A set of header lines starting with '##' characters.
2. A set of tab separated columns holding the variant data starting from the row beginning with '#CHROM'.

Variant annotations are stored in the INFO field as key=value pairs separated by semicolons. The value can be textual, numerical, or an array holding values applying to different alleles at a given position.

The exact layout of an annotation is specified in a VCF header line. VCF header lines have four attributes: ID, Number, Type, Description. ID holds the annotation name and Description summarises its purpose.

The Number attribute specifies whether the value of the key=value pair is a flag (0), a single value (1), or an array of values (A, R). A-arrays hold values only for the alternative alleles. R-arrays hold values for all alleles including the reference allele.

The Type attribute instead spells out the data type of the value (String for text, Integer for natural number, Float for real number, Flag for boolean presence/absense).

The following table shows the header line attributes for the annotations in the VCF example shown above.

ID Number Type Description
NS 1 Integer Number of Samples With Data
DP 1 Integer Total Depth
AF A Float Allele Frequency
AA 1 String Ancestral Allele
DB 0 Flag dbSNP membership, build 129

What can VCF.Filter do for me?

VCF.Filter is a standalone Java application for viewing and filtering the contents of VCF files aimed at an audience that doesn't feel comfortable using command line tools or web-based tools with their proprietary data.

VCF.Filter builds fully customizable filter chains for fields listed as VCFHeader lines,

intersects variants with runs of homozygosity provided as bed files,

calculates variant recurrence values in your cohort and filters on cohort recurrence,

analyzes pedigrees for the presence of variants following a known type of disease inheritance pattern,

prints your variants on a Hilbert curve,

and allows you to find a variant reported in the literature in your cohort of VCF files.