Workflow for Illumina Quality Control and Filtering
Version 1

Workflow Type: Common Workflow Language
Stable

Workflow for Illumina Quality Control and Filtering

Multiple paired datasets will be merged into single paired dataset.

Summary:

  • FastQC on raw data files
  • fastp for read quality trimming
  • BBduk for phiX and (optional) rRNA filtering
  • Kraken2 for taxonomic classification of reads (optional)
  • BBmap for (contamination) filtering using given references (optional)
  • FastQC on filtered (merged) data

Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

All tool CWL files and other workflows can be found here:
https://gitlab.com/m-unlock/cwl

How to setup and use an UNLOCK workflow:
https://m-unlock.gitlab.io/docs/setup/setup.html

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
identifier identifier used Identifier for this dataset used in this workflow
  • string
threads Number of threads Number of threads to use for computational processes
  • int?
memory Maximum memory in MB Maximum memory usage in MegaBytes
  • int?
forward_reads Forward reads Forward sequence fastq file(s) locally
  • File[]
reverse_reads Reverse reads Reverse sequence fastq file(s) locally
  • File[]
skip_qc_unfiltered Skip QC unfiltered Skip FastQC analyses of raw input reads (default false)
  • boolean?
skip_qc_filtered Skip QC filtered Skip FastQC analyses of filtered input reads (default false)
  • boolean?
filter_rrna filter rRNA Optionally remove rRNA sequences from the reads (default false)
  • boolean?
filter_references Filter reference file(s) References fasta file(s) for filtering
  • File[]?
deduplicate Deduplicate reads Remove exact duplicate reads with fastp
  • boolean?
kraken2_confidence Kraken2 confidence threshold Confidence score threshold (default 0.0) must be between [0, 1]
  • float?
kraken2_database Kraken2 database Kraken2 database location, multiple databases is possible
  • Directory[]?
kraken2_standard_report Kraken2 standard report Also output Kraken2 standard report with per read classification. These can be large. (default false)
  • boolean
keep_reference_mapped_reads Keep mapped reads Keep with reads mapped to the given reference (default false)
  • boolean?
prepare_reference Prepare references Prepare references to a single fasta file and unique headers (default true). When false a single fasta file as reference is expected with unique headers
  • boolean
step Output Step number Step number for output folder numbering (default 1)
  • int?
destination Output Destination Optional output destination only used for cwl-prov reporting.
  • string?

Steps

ID Name Description
fastqc_illumina_before FastQC before Quality assessment and report of reads
fastq_merge_fwd Merge forward reads Merge multiple forward fastq reads to a single file
fastq_merge_rev Merge reverse reads Merge multiple reverse fastq reads to a single file
fastq_fwd_array_to_file Fwd reads array to file Forward file of single file array to file object
fastq_rev_array_to_file Rev reads array to file Forward file of single file array to file object
fastp fastp Read quality filtering and (barcode) trimming.
rrna_filter rRNA filter (bbduk) Filters rRNA sequences from reads using bbduk
reference_array_to_file Reference array to file Array to file object when the reference does not need to be prepared
prepare_fasta_db Prepare references Prepare references to a single fasta file and unique headers
reference_filter_illumina Reference read mapping Map reads against references using BBMap
phix_filter PhiX filter (bbduk) Filters illumina spike-in PhiX sequences from reads using bbduk
illumina_kraken2_unfiltered Kraken2 unfiltered Taxonomic classification on unfiltered files
illumina_kraken2_filtered Kraken2 unfiltered Taxonomic classification on unfiltered files
illumina_kraken2_compress Compress kraken2 Compress large kraken2 report file
illumina_kraken2_krona Krona Kraken2 Visualization of kraken2 with Krona
fastqc_illumina_after FastQC after Quality assessment and report of reads
reports_files_to_folder Reports to folder Preparation of fastp output files to a specific output folder
kraken2_files_to_folder Kraken2 folder Kraken2 files to single folder

Outputs

ID Name Description Type
reports_folder Filtering reports folder Folder containing all reports of filtering and quality control
  • Directory
kraken2_folder Kraken2 folder Folder with Kraken2 output files
  • Directory?
QC_forward_reads Filtered forward read Filtered forward read
  • File
QC_reverse_reads Filtered reverse read Filtered reverse read
  • File

Version History

Version 1 (earliest) Created 21st Apr 2022 at 14:00 by Bart Nijsse

Initial commit


Open master 5c2e0e5
help Creators and Submitter
Discussion Channel
Activity

Views: 1820

Created: 21st Apr 2022 at 14:00

Last updated: 7th Apr 2023 at 15:02

help Attributions

None

Total size: 115 KB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH