Workflow Type: Common Workflow Language
Frozen
Stable
A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads (FastQC)
- Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
- Mapping to reference genome (BWA-MEM)
- Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
- Sorting mapped reads based on read names (samtools)
- Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
- Re-sorting mapped reads based on chromosomal coordinates (samtools)
- Adding basic Read-Group information regarding sample name, platform unit, platform (e.g., ILLUMINA), library and identifier (picard AddOrReplaceReadGroups)
- Marking PCR and/or optical duplicate reads (picard MarkDuplicates)
- Collection of summary statistics (samtools)
- Creation of indexes for coordinate-sorted BAM files to enable fast random access (samtools)
- Splitting the reference genome into a predefined number of intervals for parallel processing (GATK SplitIntervals)
At this point the application of multi-sample workflow follows, during which multiple samples are concatenated into a single, unified VCF (Variant Calling Format) file, which contains the variant information for all samples:
- Application of Base Quality Score Recalibration (BQSR) (GATK BaseRecalibrator and ApplyBQSR tools)
- Variant calling in gVCF (genomic VCF) mode (-ERC GVCF) (GATK HaplotypeCaller)
- Merging of all genomic interval-split gVCF files for each sample (GATK MergeVCFs)
- Generation of the unified VCF file (GATK CombineGVCFs and GenotypeGVCFs tools)
- Separate annotation for SNP and INDEL variants, using the Variant Quality Score Recalibration (VQSR) method (GATK VariantRecalibrator and ApplyVQSR tools)
- Variant filtering based on the information added during VQSR and/or custom filters (bcftools)
- Normalization of INDELs (split multiallelic sites) (bcftools)
- Annotation of the final dataset of filtered variants with genomic, population-related and/or clinical information (ANNOVAR)
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
raw_files_directory | n/a | n/a |
|
input_file_split | n/a | n/a |
|
input_file_split_fwd_single | n/a | n/a |
|
input_file_split_rev | n/a | n/a |
|
input_qc_check | n/a | n/a |
|
input_trimming_check | n/a | n/a |
|
tg_quality | n/a | n/a |
|
tg_length | n/a | n/a |
|
tg_compression | n/a | n/a |
|
tg_do_not_compress | n/a | n/a |
|
tg_strigency | n/a | n/a |
|
tg_trim_suffix | n/a | n/a |
|
reference_genome | n/a | n/a |
|
bwa_mem_sec_shorter_split_hits | n/a | n/a |
|
bwa_mem_num_threads | n/a | n/a |
|
samtools_view_uncompressed | n/a | n/a |
|
samtools_view_collapsecigar | n/a | n/a |
|
samtools_view_readswithbits | n/a | n/a |
|
samtools_view_readswithoutbits | n/a | n/a |
|
samtools_view_fastcompression | n/a | n/a |
|
samtools_view_samheader | n/a | n/a |
|
samtools_view_count | n/a | n/a |
|
samtools_view_readsingroup | n/a | n/a |
|
samtools_view_readtagtostrip | n/a | n/a |
|
samtools_view_readsquality | n/a | n/a |
|
samtools_view_cigar | n/a | n/a |
|
samtools_view_iscram | n/a | n/a |
|
samtools_view_threads | n/a | n/a |
|
samtools_view_randomseed | n/a | n/a |
|
samtools_view_region | n/a | n/a |
|
samtools_view_readsinlibrary | n/a | n/a |
|
samtools_view_target_bed_file | n/a | n/a |
|
samtools_fixmate_threads | n/a | n/a |
|
samtools_fixmate_output_format | n/a | n/a |
|
samtools_sort_compression_level | n/a | n/a |
|
samtools_sort_threads | n/a | n/a |
|
samtools_sort_memory | n/a | n/a |
|
samtools_flagstat_threads | n/a | n/a |
|
picard_addorreplacereadgroups_rgpl | n/a | n/a |
|
gatk_splitintervals_include_intervalList | n/a | n/a |
|
gatk_splitintervals_exclude_intervalList | n/a | n/a |
|
gatk_splitintervals_scatter_count | n/a | n/a |
|
sub_bqsr_known_sites_1 | n/a | n/a |
|
sub_bqsr_known_sites_2 | n/a | n/a |
|
sub_bqsr_known_sites_3 | n/a | n/a |
|
sub_bqsr_interval_padding | n/a | n/a |
|
sub_bqsr_hc_native_pairHMM_threads | n/a | n/a |
|
sub_bqsr_hc_java_options | n/a | n/a |
|
VariantRecalibrator_use_annotation | n/a | n/a |
|
VariantRecalibrator_trust_all_polymorphic | n/a | n/a |
|
VariantRecalibrator_truth_sensitivity_trance_indels | n/a | n/a |
|
vqsr_arguments_indels_1 | n/a | n/a |
|
vqsr_known_sites_indels_1 | n/a | n/a |
|
vqsr_arguments_indels_2 | n/a | n/a |
|
vqsr_known_sites_indels_2 | n/a | n/a |
|
vqsr_arguments_indels_3 | n/a | n/a |
|
vqsr_known_sites_indels_3 | n/a | n/a |
|
VariantRecalibrator_truth_sensitivity_trance_snps | n/a | n/a |
|
vqsr_arguments_snps_1 | n/a | n/a |
|
vqsr_known_sites_snps_1 | n/a | n/a |
|
vqsr_arguments_snps_2 | n/a | n/a |
|
vqsr_known_sites_snps_2 | n/a | n/a |
|
vqsr_arguments_snps_3 | n/a | n/a |
|
vqsr_known_sites_snps_3 | n/a | n/a |
|
vqsr_arguments_snps_4 | n/a | n/a |
|
vqsr_known_sites_snps_4 | n/a | n/a |
|
ApplyVQSR_ts_filter_level | n/a | n/a |
|
bcftools_view_include_VQSR_filters | n/a | n/a |
|
bcftools_view_threads | n/a | n/a |
|
bcftools_norm_threads | n/a | n/a |
|
bcftools_norm_multiallelics | n/a | n/a |
|
table_annovar_database_location | n/a | n/a |
|
table_annovar_build_over | n/a | n/a |
|
table_annovar_remove | n/a | n/a |
|
table_annovar_protocol | n/a | n/a |
|
table_annovar_operation | n/a | n/a |
|
table_annovar_na_string | n/a | n/a |
|
table_annovar_vcfinput | n/a | n/a |
|
table_annovar_otherinfo | n/a | n/a |
|
table_annovar_convert_arg | n/a | n/a |
|
Steps
ID | Name | Description |
---|---|---|
get_raw_files | n/a | n/a |
split_single_paired | n/a | n/a |
trim_galore_single | n/a | n/a |
trim_galore_paired | n/a | n/a |
fastqc_raw | n/a | n/a |
fastqc_single_trimmed | n/a | n/a |
fastqc_paired_trimmed | n/a | n/a |
cp_fastqc_raw_zip | n/a | n/a |
cp_fastqc_single_zip | n/a | n/a |
cp_fastqc_paired_zip | n/a | n/a |
rename_fastqc_raw_html | n/a | n/a |
rename_fastqc_single_html | n/a | n/a |
rename_fastqc_paired_html | n/a | n/a |
check_trimming | n/a | n/a |
rg_extraction_single | n/a | n/a |
bwa_mem_single | n/a | n/a |
split_paired_read1_read2 | n/a | n/a |
rg_extraction_paired | n/a | n/a |
bwa_mem_paired | n/a | n/a |
gather_bwa_sam_files | n/a | n/a |
samtools_view_conversion | n/a | n/a |
samtools_sort_by_name | n/a | n/a |
samtools_fixmate | n/a | n/a |
samtools_sort | n/a | n/a |
picard_addorreplacereadgroups | n/a | n/a |
picard_markduplicates | n/a | n/a |
samtools_flagstat | n/a | n/a |
samtools_view_count_total | n/a | n/a |
gatk_splitintervals | n/a | n/a |
samtools_index | n/a | n/a |
gatk_bqsr_subworkflow | n/a | n/a |
gatk_CombineGVCFs | n/a | n/a |
gatk_GenotypeGVCFs | n/a | n/a |
gatk_MakeSitesOnlyVcf | n/a | n/a |
gatk_VariantRecalibrator_indel | n/a | n/a |
gatk_VariantRecalibrator_snp | n/a | n/a |
gatk_ApplyVQSR_indel | n/a | n/a |
gatk_ApplyVQSR_snp | n/a | n/a |
gatk_VQSR_MergeVCFs | n/a | n/a |
bcftools_view_filter_vqsr | n/a | n/a |
bcftools_norm_vqsr | n/a | n/a |
table_annovar_filtered | n/a | n/a |
Outputs
ID | Name | Description | Type |
---|---|---|---|
o_trim_galore_single_fq | n/a | n/a |
|
o_trim_galore_single_reports | n/a | n/a |
|
o_trim_galore_paired_fq | n/a | n/a |
|
o_trim_galore_paired_reports | n/a | n/a |
|
o_fastqc_raw_html | n/a | n/a |
|
o_fastqc_single_html | n/a | n/a |
|
o_fastqc_paired_html | n/a | n/a |
|
o_fastqc_raw_zip | n/a | n/a |
|
o_fastqc_single_zip | n/a | n/a |
|
o_fastqc_paired_zip | n/a | n/a |
|
o_bwa_mem_single | n/a | n/a |
|
o_bwa_mem_paired | n/a | n/a |
|
o_gather_bwa_sam_files | n/a | n/a |
|
o_samtools_view_conversion | n/a | n/a |
|
samtools_sort_by_name | n/a | n/a |
|
o_samtools_fixmate | n/a | n/a |
|
o_samtools_sort | n/a | n/a |
|
o_picard_addorreplacereadgroups | n/a | n/a |
|
o_picard_markduplicates | n/a | n/a |
|
o_picard_markduplicates_metrics | n/a | n/a |
|
o_samtools_flagstat | n/a | n/a |
|
o_samtools_view_count_total | n/a | n/a |
|
o_samtools_index | n/a | n/a |
|
o_gatk_splitintervals | n/a | n/a |
|
o_gatk_bqsr_subworkflowbqsr_tables | n/a | n/a |
|
o_gatk_bqsr_subworkflowbqsr_bqsr_bam | n/a | n/a |
|
o_gatk_bqsr_subworkflowbqsr_hc | n/a | n/a |
|
o_gatk_bqsr_subworkflowbqsr_mergevcfs | n/a | n/a |
|
o_gatk_CombineGVCFs | n/a | n/a |
|
o_gatk_GenotypeGVCFs | n/a | n/a |
|
o_gatk_MakeSitesOnlyVcf | n/a | n/a |
|
o_gatk_VariantRecalibrator_indel_recal | n/a | n/a |
|
o_gatk_VariantRecalibrator_indel_tranches | n/a | n/a |
|
o_gatk_VariantRecalibrator_snp_recal | n/a | n/a |
|
o_gatk_VariantRecalibrator_snp_tranches | n/a | n/a |
|
o_gatk_ApplyVQSR_indel | n/a | n/a |
|
o_gatk_ApplyVQSR_snp | n/a | n/a |
|
o_gatk_VQSR_MergeVCFs | n/a | n/a |
|
o_bcftools_view_filter_vqsr | n/a | n/a |
|
o_bcftools_norm_vqsr | n/a | n/a |
|
o_table_annovar_filtered_multianno_vcf | n/a | n/a |
|
o_table_annovar_filtered_multianno_txt | n/a | n/a |
|
o_table_annovar_filtered_avinput | n/a | n/a |
|
Version History
Version 1 (earliest) Created 5th Jul 2023 at 10:44 by Konstantinos Kyritsis
Initial commit
Frozen
Version-1
aceb8de
Creators and Submitter
Creators
Submitter
Citation
Kyritsis, K., Pechlivanis, N., & Psomopoulos, F. (2023). CWL-based (multi-sample) workflow for germline variant calling. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.526.1
License
Activity
Views: 2044 Downloads: 244
Created: 5th Jul 2023 at 10:44
Annotated Properties
Attributions
None