A CWL-based pipeline for calling small germline variants, namely SNPs and small INDELs, by processing data from Whole-genome Sequencing (WGS) or Targeted Sequencing (e.g., Whole-exome sequencing; WES) experiments.
On the respective GitHub folder are available:
- The CWL wrappers and subworkflows for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
Briefly, the workflow performs the following steps:
- Quality control of Illumina reads (FastQC)
- Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
- Mapping to reference genome (BWA-MEM)
- Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
- Sorting mapped reads based on read names (samtools)
- Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
- Re-sorting mapped reads based on chromosomal coordinates (samtools)
- Adding basic Read-Group information regarding sample name, platform unit, platform (e.g., ILLUMINA), library and identifier (picard AddOrReplaceReadGroups)
- Marking PCR and/or optical duplicate reads (picard MarkDuplicates)
- Collection of summary statistics (samtools)
- Creation of indexes for coordinate-sorted BAM files to enable fast random access (samtools)
- Splitting the reference genome into a predefined number of intervals for parallel processing (GATK SplitIntervals)
At this point the application of single-sample workflow follows, during which multiple samples are accepted as input and they are not merged into a unified VCF file but are rather processed separately in each step of the workflow, leading to the production of a VCF file for each sample:
- Application of Base Quality Score Recalibration (BQSR) (GATK BaseRecalibrator, GatherBQSRReports and ApplyBQSR tools)
- Variant calling (GATK HaplotypeCaller)
- Merging of all genomic interval-split gVCF files for each sample (GATK MergeVCFs)
- Separate annotation of SNPs and INDELs based on pretrained Convolutional Neural Network (CNN) models (GATK SelectVariants, CNNScoreVariants and FilterVariantTranches tools)
- (Optional) Independent step of hard-filtering (GATK VariantFiltration)
- Variant filtering based on the information added during VQSR and/or custom filters (bcftools)
- Normalization of INDELs (split multiallelic sites) (bcftools)
- Annotation of the final dataset of filtered variants with genomic, population-related and/or clinical information (ANNOVAR)
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
raw_files_directory | n/a | n/a |
|
input_file_split | n/a | n/a |
|
input_file_split_fwd_single | n/a | n/a |
|
input_file_split_rev | n/a | n/a |
|
input_qc_check | n/a | n/a |
|
input_trimming_check | n/a | n/a |
|
tg_quality | n/a | n/a |
|
tg_length | n/a | n/a |
|
tg_compression | n/a | n/a |
|
tg_do_not_compress | n/a | n/a |
|
tg_strigency | n/a | n/a |
|
tg_trim_suffix | n/a | n/a |
|
reference_genome | n/a | n/a |
|
bwa_mem_sec_shorter_split_hits | n/a | n/a |
|
bwa_mem_num_threads | n/a | n/a |
|
samtools_view_uncompressed | n/a | n/a |
|
samtools_view_collapsecigar | n/a | n/a |
|
samtools_view_readswithoutbits | n/a | n/a |
|
samtools_view_fastcompression | n/a | n/a |
|
samtools_view_samheader | n/a | n/a |
|
samtools_view_count | n/a | n/a |
|
samtools_view_readsingroup | n/a | n/a |
|
samtools_view_readtagtostrip | n/a | n/a |
|
samtools_view_readsquality | n/a | n/a |
|
samtools_view_readswithbits | n/a | n/a |
|
samtools_view_cigar | n/a | n/a |
|
samtools_view_iscram | n/a | n/a |
|
samtools_view_threads | n/a | n/a |
|
samtools_view_randomseed | n/a | n/a |
|
samtools_view_region | n/a | n/a |
|
samtools_view_readsinlibrary | n/a | n/a |
|
samtools_fixmate_threads | n/a | n/a |
|
samtools_fixmate_output_format | n/a | n/a |
|
samtools_sort_compression_level | n/a | n/a |
|
samtools_sort_threads | n/a | n/a |
|
samtools_sort_memory | n/a | n/a |
|
samtools_flagstat_threads | n/a | n/a |
|
picard_addorreplacereadgroups_rgpl | n/a | n/a |
|
gatk_splitintervals_include_intervalList | n/a | n/a |
|
gatk_splitintervals_exclude_intervalList | n/a | n/a |
|
gatk_splitintervals_scatter_count | n/a | n/a |
|
sub_bqsr_known_sites_1 | n/a | n/a |
|
sub_bqsr_known_sites_2 | n/a | n/a |
|
sub_bqsr_known_sites_3 | n/a | n/a |
|
sub_bqsr_interval_padding | n/a | n/a |
|
sub_hc_native_pairHMM_threads | n/a | n/a |
|
sub_hc_java_options | n/a | n/a |
|
VariantFiltration_window | n/a | n/a |
|
VariantFiltration_cluster | n/a | n/a |
|
VariantFiltration_filter_name_snp | n/a | n/a |
|
VariantFiltration_filter_snp | n/a | n/a |
|
VariantFiltration_filter_name_indel | n/a | n/a |
|
VariantFiltration_filter_indel | n/a | n/a |
|
FilterVariantTranches_resource_1 | n/a | n/a |
|
FilterVariantTranches_resource_2 | n/a | n/a |
|
FilterVariantTranches_resource_3 | n/a | n/a |
|
bcftools_view_include_hard_filters | n/a | n/a |
|
bcftools_view_include_CNN_filters | n/a | n/a |
|
bcftools_view_threads | n/a | n/a |
|
bcftools_norm_threads | n/a | n/a |
|
bcftoomls_norm_multiallelics | n/a | n/a |
|
table_annovar_database_location | n/a | n/a |
|
table_annovar_build_over | n/a | n/a |
|
table_annovar_remove | n/a | n/a |
|
table_annovar_protocol | n/a | n/a |
|
table_annovar_operation | n/a | n/a |
|
table_annovar_na_string | n/a | n/a |
|
table_annovar_vcfinput | n/a | n/a |
|
table_annovar_otherinfo | n/a | n/a |
|
table_annovar_convert_arg | n/a | n/a |
|
Steps
ID | Name | Description |
---|---|---|
get_raw_files | n/a | n/a |
split_single_paired | n/a | n/a |
trim_galore_single | n/a | n/a |
trim_galore_paired | n/a | n/a |
fastqc_raw | n/a | n/a |
fastqc_single_trimmed | n/a | n/a |
fastqc_paired_trimmed | n/a | n/a |
cp_fastqc_raw_zip | n/a | n/a |
cp_fastqc_single_zip | n/a | n/a |
cp_fastqc_paired_zip | n/a | n/a |
rename_fastqc_raw_html | n/a | n/a |
rename_fastqc_single_html | n/a | n/a |
rename_fastqc_paired_html | n/a | n/a |
check_trimming | n/a | n/a |
rg_extraction_single | n/a | n/a |
bwa_mem_single | n/a | n/a |
split_paired_read1_read2 | n/a | n/a |
rg_extraction_paired | n/a | n/a |
bwa_mem_paired | n/a | n/a |
gather_bwa_sam_files | n/a | n/a |
samtools_view_conversion | n/a | n/a |
samtools_sort_by_name | n/a | n/a |
samtools_fixmate | n/a | n/a |
samtools_sort | n/a | n/a |
picard_addorreplacereadgroups | n/a | n/a |
picard_markduplicates | n/a | n/a |
samtools_flagstat | n/a | n/a |
samtools_view_count_total | n/a | n/a |
gatk_splitintervals | n/a | n/a |
samtools_index | n/a | n/a |
gatk_bqsr_subworkflow | n/a | n/a |
gatk_applybqsr | n/a | n/a |
samtools_index_2 | n/a | n/a |
gatk_haplotypecaller_subworkflow | n/a | n/a |
gatk_SelectVariants_snps | n/a | n/a |
gatk_SelectVariants_indels | n/a | n/a |
gatk_VariantFiltration_snps | n/a | n/a |
gatk_VariantFiltration_indels | n/a | n/a |
bgzip_snps | n/a | n/a |
tabix_snps | n/a | n/a |
bgzip_indels | n/a | n/a |
tabix_indels | n/a | n/a |
bcftools_concat | n/a | n/a |
bcftools_view_hard_filter | n/a | n/a |
bcftools_norm_hard_filter | n/a | n/a |
table_annovar_hard_filtered | n/a | n/a |
gatk_CNNScoreVariants | n/a | n/a |
gatk_FilterVariantTranches | n/a | n/a |
bcftools_view_filter_cnn | n/a | n/a |
bcftools_norm_cnn | n/a | n/a |
table_annovar_cnn_filtered | n/a | n/a |
Outputs
ID | Name | Description | Type |
---|---|---|---|
o_trim_galore_single_fq | n/a | n/a |
|
o_trim_galore_single_reports | n/a | n/a |
|
o_trim_galore_paired_fq | n/a | n/a |
|
o_trim_galore_paired_reports | n/a | n/a |
|
o_fastqc_raw_html | n/a | n/a |
|
o_fastqc_single_html | n/a | n/a |
|
o_fastqc_paired_html | n/a | n/a |
|
o_fastqc_raw_zip | n/a | n/a |
|
o_fastqc_single_zip | n/a | n/a |
|
o_fastqc_paired_zip | n/a | n/a |
|
o_gather_bwa_sam_files | n/a | n/a |
|
o_samtools_view_conversion | n/a | n/a |
|
o_samtools_sort_by_name | n/a | n/a |
|
o_samtools_fixmate | n/a | n/a |
|
o_samtools_sort | n/a | n/a |
|
o_picard_addorreplacereadgroups | n/a | n/a |
|
o_picard_markduplicates | n/a | n/a |
|
o_picard_markduplicates_metrics | n/a | n/a |
|
o_samtools_flagstat | n/a | n/a |
|
o_samtools_view_count_total | n/a | n/a |
|
o_samtools_index | n/a | n/a |
|
o_gatk_bqsr_subworkflow | n/a | n/a |
|
o_gatk_ApplyBQSR | n/a | n/a |
|
o_samtools_index_2 | n/a | n/a |
|
o_gatk_splitintervals | n/a | n/a |
|
o_gatk_HaplotypeCaller | n/a | n/a |
|
o_tabix_snps | n/a | n/a |
|
o_tabix_indels | n/a | n/a |
|
o_bcftools_concat | n/a | n/a |
|
o_bcftools_view_hard_filter | n/a | n/a |
|
o_bcftools_norm_hard_filter | n/a | n/a |
|
o_gatk_CNNScoreVariants | n/a | n/a |
|
o_gatk_FilterVariantTranches | n/a | n/a |
|
o_bcftools_view_filter_cnn | n/a | n/a |
|
o_bcftools_norm_cnn | n/a | n/a |
|
o_table_annovar_cnn_filtered_multianno_vcf | n/a | n/a |
|
o_table_annovar_cnn_filtered_multianno_txt | n/a | n/a |
|
o_table_annovar_cnn_filtered_avinput | n/a | n/a |
|
o_table_annovar_hard_filtered_multianno_vcf | n/a | n/a |
|
o_table_annovar_hard_filtered_multianno_txt | n/a | n/a |
|
o_table_annovar_hard_filtered_avinput | n/a | n/a |
|
Version History
Version 1 (earliest) Created 5th Jul 2023 at 10:48 by Konstantinos Kyritsis
Initial commit
Frozen
Version-1
be8c585
Creators
Submitter
Views: 1993 Downloads: 240
Created: 5th Jul 2023 at 10:48
None