CWL-based (single-sample) workflow for germline variant calling
Version 1

Workflow Type: Common Workflow Language
Stable

A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.

On the respective GitHub folder are available:

  • The CWL wrappers and subworkflows for the workflow
  • A pre-configured YAML template, based on validation analysis of publicly available HTS data

Briefly, the workflow performs the following steps:

  1. Quality control of Illumina reads (FastQC)
  2. Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
  3. Mapping to reference genome (BWA-MEM)
  4. Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
  5. Sorting mapped reads based on read names (samtools)
  6. Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
  7. Re-sorting mapped reads based on chromosomal coordinates (samtools)
  8. Adding basic Read-Group information regarding sample name, platform unit, platform (e.g., ILLUMINA), library and identifier (picard AddOrReplaceReadGroups)
  9. Marking PCR and/or optical duplicate reads (picard MarkDuplicates)
  10. Collection of summary statistics (samtools)
  11. Creation of indexes for coordinate-sorted BAM files to enable fast random access (samtools)
  12. Splitting the reference genome into a predefined number of intervals for parallel processing (GATK SplitIntervals)

At this point the application of single-sample workflow follows, during which multiple samples are accepted as input and they are not merged into a unified VCF file but are rather processed separately in each step of the workflow, leading to the production of a VCF file for each sample:

  1. Application of Base Quality Score Recalibration (BQSR) (GATK BaseRecalibrator, GatherBQSRReports and ApplyBQSR tools)
  2. Variant calling (GATK HaplotypeCaller)
  3. Merging of all genomic interval-split gVCF files for each sample (GATK MergeVCFs)
  4. Separate annotation of SNPs and INDELs based on pretrained Convolutional Neural Network (CNN) models (GATK SelectVariants, CNNScoreVariants and FilterVariantTranches tools)
  5. (Optional) Independent step of hard-filtering (GATK VariantFiltration)
  6. Variant filtering based on the information added during VQSR and/or custom filters (bcftools)
  7. Normalization of INDELs (split multiallelic sites) (bcftools)
  8. Annotation of the final dataset of filtered variants with genomic, population-related and/or clinical information (ANNOVAR)

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
raw_files_directory n/a n/a
  • Directory
input_file_split n/a n/a
  • string?
input_file_split_fwd_single n/a n/a
  • string?
input_file_split_rev n/a n/a
  • string?
input_qc_check n/a n/a
  • boolean?
input_trimming_check n/a n/a
  • boolean?
tg_quality n/a n/a
  • int
tg_length n/a n/a
  • int
tg_compression n/a n/a
  • boolean
tg_do_not_compress n/a n/a
  • boolean
tg_strigency n/a n/a
  • int
tg_trim_suffix n/a n/a
  • string
reference_genome n/a n/a
  • File
bwa_mem_sec_shorter_split_hits n/a n/a
  • boolean
bwa_mem_num_threads n/a n/a
  • int
samtools_view_uncompressed n/a n/a
  • boolean
samtools_view_collapsecigar n/a n/a
  • boolean
samtools_view_readswithoutbits n/a n/a
  • int
samtools_view_fastcompression n/a n/a
  • boolean
samtools_view_samheader n/a n/a
  • boolean
samtools_view_count n/a n/a
  • boolean
samtools_view_readsingroup n/a n/a
  • string?
samtools_view_readtagtostrip n/a n/a
  • string[]?
samtools_view_readsquality n/a n/a
  • int?
samtools_view_readswithbits n/a n/a
  • int?
samtools_view_cigar n/a n/a
  • int?
samtools_view_iscram n/a n/a
  • boolean
samtools_view_threads n/a n/a
  • int?
samtools_view_randomseed n/a n/a
  • float?
samtools_view_region n/a n/a
  • string?
samtools_view_readsinlibrary n/a n/a
  • string?
samtools_fixmate_threads n/a n/a
  • int
samtools_fixmate_output_format n/a n/a
  • string
samtools_sort_compression_level n/a n/a
  • int?
samtools_sort_threads n/a n/a
  • int?
samtools_sort_memory n/a n/a
  • string?
samtools_flagstat_threads n/a n/a
  • int?
picard_addorreplacereadgroups_rgpl n/a n/a
  • string?
gatk_splitintervals_include_intervalList n/a n/a
  • File?
gatk_splitintervals_exclude_intervalList n/a n/a
  • File?
gatk_splitintervals_scatter_count n/a n/a
  • int
sub_bqsr_known_sites_1 n/a n/a
  • File
sub_bqsr_known_sites_2 n/a n/a
  • File
sub_bqsr_known_sites_3 n/a n/a
  • File
sub_bqsr_interval_padding n/a n/a
  • int?
sub_hc_native_pairHMM_threads n/a n/a
  • int?
sub_hc_java_options n/a n/a
  • string?
VariantFiltration_window n/a n/a
  • int
VariantFiltration_cluster n/a n/a
  • int
VariantFiltration_filter_name_snp n/a n/a
  • array containing
    • string
VariantFiltration_filter_snp n/a n/a
  • array containing
    • string
VariantFiltration_filter_name_indel n/a n/a
  • array containing
    • string
VariantFiltration_filter_indel n/a n/a
  • array containing
    • string
FilterVariantTranches_resource_1 n/a n/a
  • File
FilterVariantTranches_resource_2 n/a n/a
  • File?
FilterVariantTranches_resource_3 n/a n/a
  • File?
bcftools_view_include_hard_filters n/a n/a
  • string
bcftools_view_include_CNN_filters n/a n/a
  • string
bcftools_view_threads n/a n/a
  • int
bcftools_norm_threads n/a n/a
  • int?
bcftoomls_norm_multiallelics n/a n/a
  • string
table_annovar_database_location n/a n/a
  • Directory
table_annovar_build_over n/a n/a
  • string
table_annovar_remove n/a n/a
  • boolean?
table_annovar_protocol n/a n/a
  • string
table_annovar_operation n/a n/a
  • string
table_annovar_na_string n/a n/a
  • string?
table_annovar_vcfinput n/a n/a
  • boolean
table_annovar_otherinfo n/a n/a
  • boolean?
table_annovar_convert_arg n/a n/a
  • string?

Steps

ID Name Description
get_raw_files n/a n/a
split_single_paired n/a n/a
trim_galore_single n/a n/a
trim_galore_paired n/a n/a
fastqc_raw n/a n/a
fastqc_single_trimmed n/a n/a
fastqc_paired_trimmed n/a n/a
cp_fastqc_raw_zip n/a n/a
cp_fastqc_single_zip n/a n/a
cp_fastqc_paired_zip n/a n/a
rename_fastqc_raw_html n/a n/a
rename_fastqc_single_html n/a n/a
rename_fastqc_paired_html n/a n/a
check_trimming n/a n/a
rg_extraction_single n/a n/a
bwa_mem_single n/a n/a
split_paired_read1_read2 n/a n/a
rg_extraction_paired n/a n/a
bwa_mem_paired n/a n/a
gather_bwa_sam_files n/a n/a
samtools_view_conversion n/a n/a
samtools_sort_by_name n/a n/a
samtools_fixmate n/a n/a
samtools_sort n/a n/a
picard_addorreplacereadgroups n/a n/a
picard_markduplicates n/a n/a
samtools_flagstat n/a n/a
samtools_view_count_total n/a n/a
gatk_splitintervals n/a n/a
samtools_index n/a n/a
gatk_bqsr_subworkflow n/a n/a
gatk_applybqsr n/a n/a
samtools_index_2 n/a n/a
gatk_haplotypecaller_subworkflow n/a n/a
gatk_SelectVariants_snps n/a n/a
gatk_SelectVariants_indels n/a n/a
gatk_VariantFiltration_snps n/a n/a
gatk_VariantFiltration_indels n/a n/a
bgzip_snps n/a n/a
tabix_snps n/a n/a
bgzip_indels n/a n/a
tabix_indels n/a n/a
bcftools_concat n/a n/a
bcftools_view_hard_filter n/a n/a
bcftools_norm_hard_filter n/a n/a
table_annovar_hard_filtered n/a n/a
gatk_CNNScoreVariants n/a n/a
gatk_FilterVariantTranches n/a n/a
bcftools_view_filter_cnn n/a n/a
bcftools_norm_cnn n/a n/a
table_annovar_cnn_filtered n/a n/a

Outputs

ID Name Description Type
o_trim_galore_single_fq n/a n/a
  • File[]
o_trim_galore_single_reports n/a n/a
  • File[]
o_trim_galore_paired_fq n/a n/a
  • File[]
o_trim_galore_paired_reports n/a n/a
  • File[]
o_fastqc_raw_html n/a n/a
  • File[]?
o_fastqc_single_html n/a n/a
  • File[]?
o_fastqc_paired_html n/a n/a
  • File[]?
o_fastqc_raw_zip n/a n/a
  • Directory?
o_fastqc_single_zip n/a n/a
  • Directory?
o_fastqc_paired_zip n/a n/a
  • Directory?
o_gather_bwa_sam_files n/a n/a
  • File[]
o_samtools_view_conversion n/a n/a
  • File[]
o_samtools_sort_by_name n/a n/a
  • File[]
o_samtools_fixmate n/a n/a
  • File[]
o_samtools_sort n/a n/a
  • File[]
o_picard_addorreplacereadgroups n/a n/a
  • File[]
o_picard_markduplicates n/a n/a
  • File[]
o_picard_markduplicates_metrics n/a n/a
  • File[]
o_samtools_flagstat n/a n/a
  • File[]
o_samtools_view_count_total n/a n/a
  • File[]
o_samtools_index n/a n/a
  • File[]
o_gatk_bqsr_subworkflow n/a n/a
  • File[]
o_gatk_ApplyBQSR n/a n/a
  • File[]
o_samtools_index_2 n/a n/a
  • File[]
o_gatk_splitintervals n/a n/a
  • File[]
o_gatk_HaplotypeCaller n/a n/a
  • File[]
o_tabix_snps n/a n/a
  • File[]
o_tabix_indels n/a n/a
  • File[]
o_bcftools_concat n/a n/a
  • File[]
o_bcftools_view_hard_filter n/a n/a
  • File[]
o_bcftools_norm_hard_filter n/a n/a
  • File[]
o_gatk_CNNScoreVariants n/a n/a
  • File[]
o_gatk_FilterVariantTranches n/a n/a
  • File[]
o_bcftools_view_filter_cnn n/a n/a
  • File[]
o_bcftools_norm_cnn n/a n/a
  • File[]
o_table_annovar_cnn_filtered_multianno_vcf n/a n/a
  • File[]
o_table_annovar_cnn_filtered_multianno_txt n/a n/a
  • File[]
o_table_annovar_cnn_filtered_avinput n/a n/a
  • File[]
o_table_annovar_hard_filtered_multianno_vcf n/a n/a
  • File[]
o_table_annovar_hard_filtered_multianno_txt n/a n/a
  • File[]
o_table_annovar_hard_filtered_avinput n/a n/a
  • File[]

Version History

Version 1 (earliest) Created 5th Jul 2023 at 10:48 by Konstantinos Kyritsis

Initial commit


Frozen Version-1 be8c585
help Creators and Submitter
Creators
Submitter
Citation
Kyritsis, K., Pechlivanis, N., & Psomopoulos, F. (2023). CWL-based (single-sample) workflow for germline variant calling. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.527.1
License
Activity

Views: 761

Created: 5th Jul 2023 at 10:48

Annotated Properties
Topic annotations
Operation annotations
help Attributions

None

Total size: 35.2 KB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH