COVID-19: consensus construction
This workflow aims at generating reliable consensus sequences from variant calls according to transparent criteria that capture at least some of the complexity of variant calling.
It takes a collection of VCFs (with DP and DP4 INFO fields) and a collection of the corresponding aligned reads (for the purpose of calculating genome-wide coverage) such as produced by any of the variant calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling and generates a collection of viral consensus sequences and a multisample FASTA of all these sequences.
Each consensus sequence is guaranteed to capture all called, filter-passing (as per the FILTER column of the VCF input) variants found in the VCF of its sample that reach a user-defined consensus allele frequency threshold.
Filter-failing variants and variants below a second user-defined minimal allele frequency threshold will be ignored.
Genomic positions of filter-passing variants with an allele frequency in between the two thresholds will be hard-masked (with N) in the consensus sequence of their sample.
Genomic positions with a coverage (calculated from the read alignments input) below another user-defined threshold will be hard-masked, too, unless they are consensus variant sites.
Inputs
ID | Name | Description | Type |
---|---|---|---|
Depth-threshold for masking | Depth-threshold for masking | Sites in the viral genome covered by less than this number of reads are considered questionable and will be masked (with Ns) in the consensus sequence independent of whether a variant has been called at them or not. |
|
Reference genome | Reference genome | The SARS-CoV-2 reference genome |
|
Variant calls | Variant calls | Collection of VCFs produced by upstream workflows for variation analysis |
|
aligned reads data for depth calculation | aligned reads data for depth calculation | Fully processed BAMs as generated by upstream workflows for variation analysis. Note: for ARTIC data, these BAMs should NOT have undergone processing with ivar removereads. |
|
min-AF for consensus variant | min-AF for consensus variant | Only variant calls with an allele-frequency greater this value will be considered consensus variants. |
|
min-AF for failed variants | min-AF for failed variants | Variant calls with an allele frequency higher than this value, but lower than the AF threshold for consensus variants will be considered questionable and the respective sites be masked (with Ns) in the consensus sequence. |
|
Steps
ID | Name | Description |
---|---|---|
6 | Compose text parameter value | toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1 |
7 | Compose text parameter value | toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1 |
8 | bedtools Genome Coverage | toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.29.2 |
9 | Compose text parameter value | toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1 |
10 | SnpSift Filter | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1 |
11 | SnpSift Filter | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1 |
12 | Filter | Filter1 |
13 | SnpSift Extract Fields | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0 |
14 | SnpSift Extract Fields | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0 |
15 | Compute | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
16 | Compute | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
17 | Concatenate | toolshed.g2.bx.psu.edu/repos/devteam/concat/gops_concat_1/1.0.1 |
18 | Merge | toolshed.g2.bx.psu.edu/repos/devteam/merge/gops_merge_1/1.0.0 |
19 | Subtract | toolshed.g2.bx.psu.edu/repos/devteam/subtract/gops_subtract_1/1.0.0 |
20 | Compute | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
21 | bcftools consensus | toolshed.g2.bx.psu.edu/repos/iuc/bcftools_consensus/bcftools_consensus/1.15.1+galaxy2 |
22 | Collapse Collection | toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
consensus_af_threshold | consensus_af_threshold | n/a |
|
non_consensus_af_threshold | non_consensus_af_threshold | n/a |
|
depth_threshold | depth_threshold | n/a |
|
coverage_depth | coverage_depth | n/a |
|
consensus_variants | consensus_variants | n/a |
|
filter_failed_variants | filter_failed_variants | n/a |
|
low_cov_regions | low_cov_regions | n/a |
|
chrom_pos_ref_called_variants | chrom_pos_ref_called_variants | n/a |
|
chrom_pos_ref_failed_variants | chrom_pos_ref_failed_variants | n/a |
|
called_variant_sites | called_variant_sites | n/a |
|
failed_variant_sites | failed_variant_sites | n/a |
|
low_cov_regions_plus_filter_failed | low_cov_regions_plus_filter_failed | n/a |
|
low_cov_regions_plus_filter_failed_combined | low_cov_regions_plus_filter_failed_combined | n/a |
|
masking_regions | masking_regions | n/a |
|
1_based_masking_regions | 1_based_masking_regions | n/a |
|
consensus | consensus | n/a |
|
multisample_consensus_fasta | multisample_consensus_fasta | n/a |
|
Version History
v0.2 (earliest) Created 23rd Jul 2021 at 10:18 by WorkflowHub Bot
Added/updated 10 files
Frozen
master
2175b4a
Creators
Not specifiedAdditional credit
Wolfgang Maier
Submitter
Views: 8226 Downloads: 1538 Runs: 0
Created: 23rd Jul 2021 at 10:18
Last updated: 25th Oct 2022 at 03:01
None