Janis Germline Variant-Calling Workflow (GATK)
Version 1

Workflow Type: Janis
Work-in-progress

This is a genomics pipeline to do a single germline sample variant-calling, adapted from GATK Best Practice Workflow.

This workflow is a reference pipeline for using the Janis Python framework (pipelines assistant).

  • Alignment: bwa-mem
  • Variant-Calling: GATK HaplotypeCaller
  • Outputs the final variants in the VCF format.

Resources

This pipeline has been tested using the HG38 reference set, available on Google Cloud Storage through:

This pipeline expects the assembly references to be as they appear in that storage (".fai", ".amb", ".ann", ".bwt", ".pac", ".sa", "^.dict"). The known sites (snps_dbsnp, snps_1000gp, known_indels, mills_indels) should be gzipped and tabix indexed.

Infrastructure_deployment_metadata: Spartan (Unimelb)

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
sample_name n/a Sample name from which to generate the readGroupHeaderLine for BwaMem
  • string
fastqs n/a An array of FastqGz pairs. These are aligned separately and merged to create higher depth coverages from multiple sets of reads
  • array containing
    • array containing
      • File
reference n/a The reference genome from which to align the reads. This requires a number indexes (can be generated with the 'IndexFasta' pipeline This pipeline has been tested using the HG38 reference set. This pipeline expects the assembly references to be as they appear in the GCP example. For example: - HG38: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/ - (".fai", ".amb", ".ann", ".bwt", ".pac", ".sa", "^.dict").
  • File
snps_dbsnp n/a From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``
  • File
snps_1000gp n/a From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``. Accessible from the HG38 genomics-public-data google cloud bucket: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/
  • File
known_indels n/a From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``
  • File
mills_indels n/a From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``
  • File
gatk_intervals n/a List of intervals over which to split the GATK variant calling. If no interval is provided, one interval for each chromosome in the reference will be generated.
  • array containing
    • File
cutadapt_adapters n/a Specifies a containment list for cutadapt, which contains a list of sequences to determine valid overrepresented sequences from the FastQC report to trim with Cuatadapt. The file must contain sets of named adapters in the form: ``name[tab]sequence``. Lines prefixed with a hash will be ignored.
  • File
align_and_sort_sortsam_tmpDir n/a Undocumented option
  • string

Steps

ID Name Description
fastqc FastQC n/a
getfastqc_adapters Parse FastQC Adaptors n/a
align_and_sort Align and sort reads n/a
merge_and_mark Merge and Mark Duplicates n/a
calculate_performancesummary_genomefile Generate genome for BedtoolsCoverage n/a
performance_summary Performance summary workflow (whole genome) n/a
generate_gatk_intervals Generating genomic intervals by chromosome n/a
_evaluate_prescatter-bqsr-intervals n/a n/a
bqsr GATK Base Recalibration on Bam Perform base quality score recalibration
_evaluate_prescatter-vc_gatk-intervals n/a n/a
vc_gatk GATK4 Germline Variant Caller n/a
vc_gatk_merge GATK4: Gather VCFs n/a
vc_gatk_compressvcf BGZip n/a
vc_gatk_sort_combined BCFTools: Sort n/a
vc_gatk_uncompress UncompressArchive n/a
vc_gatk_addbamstats Annotate Bam Stats to Germline Vcf Workflow n/a

Outputs

ID Name Description Type
out_fastqc_reports n/a A zip file of the FastQC quality report.
  • array containing
    • array containing
      • File
out_bam n/a Aligned and indexed bam.
  • File
out_performance_summary n/a A text file of performance summary of bam
  • File
out_variants_gatk n/a Merged variants from the GATK caller
  • File
out_variants_gatk_split n/a Unmerged variants from the GATK caller (by interval)
  • array containing
    • File
out_variants_bamstats n/a n/a
  • File

Version History

Version 1 (earliest) Created 12th Nov 2021 at 02:30 by Richard Lupat

Added/updated 2 files


Open master 2e7a0bb
help Creators and Submitter
Creator
Additional credit

Michael Franklin; Jiaan Yu; Juny Kesumadewi

Submitter
Activity

Views: 2107   Downloads: 127

Created: 12th Nov 2021 at 02:30

Last updated: 12th Nov 2021 at 02:41

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 17.3 KB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH