Generic variation analysis reporting
Version 1

Workflow Type: Galaxy

This workflow generates reports from a list of variants generated by Variant Calling Workflow.

The workflow accepts a single input:

  • A collection of VCF files

The workflow produces two outputs (format description below):

  1. A list of variants grouped by Sample
  2. A list of variants grouped by Variant

Here is example of output by sample. In this table all varinats in all samples are epxlicitrly listed:

Sample POS FILTER REF ALT DP AF AFcaller SB DP4 IMPACT FUNCLASS EFFECT GENE CODON AA TRID min(AF) max(AF) countunique(change) countunique(FUNCLASS) change
ERR3485786 11644 PASS A G 97 0.979381 0.907216 0 1,1,49,46 LOW SILENT SYNONYMOUS_CODING D7L tgT/tgC C512 AKG51361.1 0.979381 1 1 1 A>G
ERR3485786 11904 PASS T C 102 0.990196 0.95098 0 0,0,51,50 MODERATE MISSENSE NON_SYNONYMOUS_CODING D7L Act/Gct T426A AKG51361.1 0.990196 1 1 1 T>C

Note the two alernative allele frequency fields: "AFcaller" ans "AF". LoFreq reports AF values listed in "AFcaller". They incorrect due to the known LoFreq bug. To correct for this we are recomputing AF values from DP4 and DP fields as follows: AF == (DP4[2] + DP4[3]) / DP.

Here is an example of output by variant. In this table data is aggregated by variant across all samples in which this variant is present:

POS REF ALT IMPACT FUNCLASS EFFECT GENE CODON AA TRID countunique(Sample) min(AF) max(AF) SAMPLES(above-thresholds) SAMPLES(all) AFs(all) change
11644 A G LOW SILENT SYNONYMOUS_CODING D7L tgT/tgC C512 AKG51361.1 11 0.979381 1 ERR3485786,ERR3485787... ERR3485786,ERR3485787,ERR3485789 ... 0.979381,1.0... A>G
11904 T C MODERATE MISSENSE NON_SYNONYMOUS_CODING D7L Act/Gct T426A AKG51361.1 12 0.990196 1 ERR3485786,ERR3485787... ERR3485786,ERR3485787,ERR3485789... 0.990196,1.0,1.0... T>C

The workflow can be accessed at

The general idea of the workflow is:


ID Name Description Type
AF Filter AF Filter Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports.
  • float?
DP Filter DP Filter Depth Filter. This is the minimum depth of all alignments at a variant site.
  • int?
DP_ALT Filter DP_ALT Filter Depth Filter for variant allele. This is the minimum depth of alignments supporting a variant.
  • int?
Variation data to report Variation data to report Variation data in VCF format. Can be the output of any of the workflows in
  • File[]


ID Name Description
4 SnpSift Filter
5 Compose text parameter value
6 Compose text parameter value
7 SnpSift Filter
8 SnpSift Extract Fields
9 Compute
10 Datamash
11 Replace
12 Replace
13 Replace
14 Collapse Collection
15 Compute
16 Compute
17 Replace
18 Datamash
19 Filter Filter1
20 Datamash
21 Join
22 Datamash
23 Datamash
24 Datamash
25 Join
26 Join
27 Cut Cut1
28 Join
29 Cut Cut1
30 Replace
31 Cut Cut1
32 Split file
33 Sort
34 Sort


ID Name Description Type
prefiltered_variants prefiltered_variants n/a
  • File
filtered_variants filtered_variants n/a
  • File
filtered_extracted_variants filtered_extracted_variants n/a
  • File
af_recalculated af_recalculated n/a
  • File
collapsed_effects collapsed_effects n/a
  • File
highest_impact_effects highest_impact_effects n/a
  • File
cleaned_header cleaned_header n/a
  • File
processed_variants_collection processed_variants_collection n/a
  • File
all_variants_all_samples all_variants_all_samples n/a
  • File
variants_for_plotting variants_for_plotting n/a
  • File
by_variant_report by_variant_report n/a
  • File
combined_variant_report combined_variant_report n/a
  • File

Additional credit

Wolfgang Maier


