CWL-based (multi-sample) workflow for germline variant calling
Version 1

Workflow Type: Common Workflow Language
Stable

A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.

On the respective GitHub folder are available:

  • The CWL wrappers and subworkflows for the workflow
  • A pre-configured YAML template, based on validation analysis of publicly available HTS data

Briefly, the workflow performs the following steps:

  1. Quality control of Illumina reads (FastQC)
  2. Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
  3. Mapping to reference genome (BWA-MEM)
  4. Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
  5. Sorting mapped reads based on read names (samtools)
  6. Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
  7. Re-sorting mapped reads based on chromosomal coordinates (samtools)
  8. Adding basic Read-Group information regarding sample name, platform unit, platform (e.g., ILLUMINA), library and identifier (picard AddOrReplaceReadGroups)
  9. Marking PCR and/or optical duplicate reads (picard MarkDuplicates)
  10. Collection of summary statistics (samtools)
  11. Creation of indexes for coordinate-sorted BAM files to enable fast random access (samtools)
  12. Splitting the reference genome into a predefined number of intervals for parallel processing (GATK SplitIntervals)

At this point the application of multi-sample workflow follows, during which multiple samples are concatenated into a single, unified VCF (Variant Calling Format) file, which contains the variant information for all samples:

  1. Application of Base Quality Score Recalibration (BQSR) (GATK BaseRecalibrator and ApplyBQSR tools)
  2. Variant calling in gVCF (genomic VCF) mode (-ERC GVCF) (GATK HaplotypeCaller)
  3. Merging of all genomic interval-split gVCF files for each sample (GATK MergeVCFs)
  4. Generation of the unified VCF file (GATK CombineGVCFs and GenotypeGVCFs tools)
  5. Separate annotation for SNP and INDEL variants, using the Variant Quality Score Recalibration (VQSR) method (GATK VariantRecalibrator and ApplyVQSR tools)
  6. Variant filtering based on the information added during VQSR and/or custom filters (bcftools)
  7. Normalization of INDELs (split multiallelic sites) (bcftools)
  8. Annotation of the final dataset of filtered variants with genomic, population-related and/or clinical information (ANNOVAR)

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
raw_files_directory n/a n/a
  • Directory
input_file_split n/a n/a
  • string?
input_file_split_fwd_single n/a n/a
  • string?
input_file_split_rev n/a n/a
  • string?
input_qc_check n/a n/a
  • boolean?
input_trimming_check n/a n/a
  • boolean?
tg_quality n/a n/a
  • int
tg_length n/a n/a
  • int
tg_compression n/a n/a
  • boolean
tg_do_not_compress n/a n/a
  • boolean
tg_strigency n/a n/a
  • int
tg_trim_suffix n/a n/a
  • string
reference_genome n/a n/a
  • File
bwa_mem_sec_shorter_split_hits n/a n/a
  • boolean
bwa_mem_num_threads n/a n/a
  • int
samtools_view_uncompressed n/a n/a
  • boolean
samtools_view_collapsecigar n/a n/a
  • boolean
samtools_view_readswithbits n/a n/a
  • int?
samtools_view_readswithoutbits n/a n/a
  • int
samtools_view_fastcompression n/a n/a
  • boolean
samtools_view_samheader n/a n/a
  • boolean
samtools_view_count n/a n/a
  • boolean
samtools_view_readsingroup n/a n/a
  • string?
samtools_view_readtagtostrip n/a n/a
  • string[]?
samtools_view_readsquality n/a n/a
  • int?
samtools_view_cigar n/a n/a
  • int?
samtools_view_iscram n/a n/a
  • boolean
samtools_view_threads n/a n/a
  • int?
samtools_view_randomseed n/a n/a
  • float?
samtools_view_region n/a n/a
  • string?
samtools_view_readsinlibrary n/a n/a
  • string?
samtools_view_target_bed_file n/a n/a
  • File?
samtools_fixmate_threads n/a n/a
  • int
samtools_fixmate_output_format n/a n/a
  • string
samtools_sort_compression_level n/a n/a
  • int?
samtools_sort_threads n/a n/a
  • int?
samtools_sort_memory n/a n/a
  • string?
samtools_flagstat_threads n/a n/a
  • int?
picard_addorreplacereadgroups_rgpl n/a n/a
  • string?
gatk_splitintervals_include_intervalList n/a n/a
  • File?
gatk_splitintervals_exclude_intervalList n/a n/a
  • File?
gatk_splitintervals_scatter_count n/a n/a
  • int
sub_bqsr_known_sites_1 n/a n/a
  • File
sub_bqsr_known_sites_2 n/a n/a
  • File?
sub_bqsr_known_sites_3 n/a n/a
  • File?
sub_bqsr_interval_padding n/a n/a
  • int?
sub_bqsr_hc_native_pairHMM_threads n/a n/a
  • int?
sub_bqsr_hc_java_options n/a n/a
  • string?
VariantRecalibrator_use_annotation n/a n/a
  • string[]
VariantRecalibrator_trust_all_polymorphic n/a n/a
  • boolean?
VariantRecalibrator_truth_sensitivity_trance_indels n/a n/a
  • float[]?
vqsr_arguments_indels_1 n/a n/a
  • string
vqsr_known_sites_indels_1 n/a n/a
  • File
vqsr_arguments_indels_2 n/a n/a
  • string?
vqsr_known_sites_indels_2 n/a n/a
  • File?
vqsr_arguments_indels_3 n/a n/a
  • string?
vqsr_known_sites_indels_3 n/a n/a
  • File?
VariantRecalibrator_truth_sensitivity_trance_snps n/a n/a
  • float[]?
vqsr_arguments_snps_1 n/a n/a
  • string
vqsr_known_sites_snps_1 n/a n/a
  • File
vqsr_arguments_snps_2 n/a n/a
  • string?
vqsr_known_sites_snps_2 n/a n/a
  • File?
vqsr_arguments_snps_3 n/a n/a
  • string?
vqsr_known_sites_snps_3 n/a n/a
  • File?
vqsr_arguments_snps_4 n/a n/a
  • string?
vqsr_known_sites_snps_4 n/a n/a
  • File?
ApplyVQSR_ts_filter_level n/a n/a
  • float?
bcftools_view_include_VQSR_filters n/a n/a
  • string
bcftools_view_threads n/a n/a
  • int?
bcftools_norm_threads n/a n/a
  • int?
bcftools_norm_multiallelics n/a n/a
  • string
table_annovar_database_location n/a n/a
  • Directory
table_annovar_build_over n/a n/a
  • string
table_annovar_remove n/a n/a
  • boolean?
table_annovar_protocol n/a n/a
  • string
table_annovar_operation n/a n/a
  • string
table_annovar_na_string n/a n/a
  • string?
table_annovar_vcfinput n/a n/a
  • boolean
table_annovar_otherinfo n/a n/a
  • boolean?
table_annovar_convert_arg n/a n/a
  • string?

Steps

ID Name Description
get_raw_files n/a n/a
split_single_paired n/a n/a
trim_galore_single n/a n/a
trim_galore_paired n/a n/a
fastqc_raw n/a n/a
fastqc_single_trimmed n/a n/a
fastqc_paired_trimmed n/a n/a
cp_fastqc_raw_zip n/a n/a
cp_fastqc_single_zip n/a n/a
cp_fastqc_paired_zip n/a n/a
rename_fastqc_raw_html n/a n/a
rename_fastqc_single_html n/a n/a
rename_fastqc_paired_html n/a n/a
check_trimming n/a n/a
rg_extraction_single n/a n/a
bwa_mem_single n/a n/a
split_paired_read1_read2 n/a n/a
rg_extraction_paired n/a n/a
bwa_mem_paired n/a n/a
gather_bwa_sam_files n/a n/a
samtools_view_conversion n/a n/a
samtools_sort_by_name n/a n/a
samtools_fixmate n/a n/a
samtools_sort n/a n/a
picard_addorreplacereadgroups n/a n/a
picard_markduplicates n/a n/a
samtools_flagstat n/a n/a
samtools_view_count_total n/a n/a
gatk_splitintervals n/a n/a
samtools_index n/a n/a
gatk_bqsr_subworkflow n/a n/a
gatk_CombineGVCFs n/a n/a
gatk_GenotypeGVCFs n/a n/a
gatk_MakeSitesOnlyVcf n/a n/a
gatk_VariantRecalibrator_indel n/a n/a
gatk_VariantRecalibrator_snp n/a n/a
gatk_ApplyVQSR_indel n/a n/a
gatk_ApplyVQSR_snp n/a n/a
gatk_VQSR_MergeVCFs n/a n/a
bcftools_view_filter_vqsr n/a n/a
bcftools_norm_vqsr n/a n/a
table_annovar_filtered n/a n/a

Outputs

ID Name Description Type
o_trim_galore_single_fq n/a n/a
  • File[]
o_trim_galore_single_reports n/a n/a
  • File[]
o_trim_galore_paired_fq n/a n/a
  • File[]
o_trim_galore_paired_reports n/a n/a
  • File[]
o_fastqc_raw_html n/a n/a
  • File[]?
o_fastqc_single_html n/a n/a
  • File[]?
o_fastqc_paired_html n/a n/a
  • File[]?
o_fastqc_raw_zip n/a n/a
  • Directory?
o_fastqc_single_zip n/a n/a
  • Directory?
o_fastqc_paired_zip n/a n/a
  • Directory?
o_bwa_mem_single n/a n/a
  • File[]
o_bwa_mem_paired n/a n/a
  • File[]
o_gather_bwa_sam_files n/a n/a
  • File[]
o_samtools_view_conversion n/a n/a
  • File[]
samtools_sort_by_name n/a n/a
  • File[]
o_samtools_fixmate n/a n/a
  • File[]
o_samtools_sort n/a n/a
  • File[]
o_picard_addorreplacereadgroups n/a n/a
  • File[]
o_picard_markduplicates n/a n/a
  • File[]
o_picard_markduplicates_metrics n/a n/a
  • File[]
o_samtools_flagstat n/a n/a
  • File[]
o_samtools_view_count_total n/a n/a
  • File[]
o_samtools_index n/a n/a
  • File[]
o_gatk_splitintervals n/a n/a
  • File[]
o_gatk_bqsr_subworkflowbqsr_tables n/a n/a
  • array containing
    • array containing
      • File
o_gatk_bqsr_subworkflowbqsr_bqsr_bam n/a n/a
  • array containing
    • array containing
      • File
o_gatk_bqsr_subworkflowbqsr_hc n/a n/a
  • array containing
    • array containing
      • File
o_gatk_bqsr_subworkflowbqsr_mergevcfs n/a n/a
  • File[]
o_gatk_CombineGVCFs n/a n/a
  • File
o_gatk_GenotypeGVCFs n/a n/a
  • File
o_gatk_MakeSitesOnlyVcf n/a n/a
  • File
o_gatk_VariantRecalibrator_indel_recal n/a n/a
  • File[]
o_gatk_VariantRecalibrator_indel_tranches n/a n/a
  • File[]
o_gatk_VariantRecalibrator_snp_recal n/a n/a
  • File[]
o_gatk_VariantRecalibrator_snp_tranches n/a n/a
  • File[]
o_gatk_ApplyVQSR_indel n/a n/a
  • File[]
o_gatk_ApplyVQSR_snp n/a n/a
  • File[]
o_gatk_VQSR_MergeVCFs n/a n/a
  • File
o_bcftools_view_filter_vqsr n/a n/a
  • File
o_bcftools_norm_vqsr n/a n/a
  • File
o_table_annovar_filtered_multianno_vcf n/a n/a
  • File
o_table_annovar_filtered_multianno_txt n/a n/a
  • File
o_table_annovar_filtered_avinput n/a n/a
  • File

Version History

Version 1 (earliest) Created 5th Jul 2023 at 10:44 by Konstantinos Kyritsis

Initial commit


Frozen Version-1 aceb8de
help Creators and Submitter
Creators
Submitter
Citation
Kyritsis, K., Pechlivanis, N., & Psomopoulos, F. (2023). CWL-based (multi-sample) workflow for germline variant calling. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.526.1
License
Activity

Views: 536

Created: 5th Jul 2023 at 10:44

Annotated Properties
Topic annotations
Operation annotations
help Attributions

None

Total size: 33.7 KB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH