Workflow for Illumina Quality Control and Filtering
Version 1

Workflow Type: Common Workflow Language
Stable

Workflow for Illumina Quality Control and Filtering

Multiple paired datasets will be merged into single paired dataset.

Summary:

  • FastQC on raw data files
  • fastp for read quality trimming
  • BBduk for phiX and (optional) rRNA filtering
  • Kraken2 for taxonomic classification of reads (optional)
  • BBmap for (contamination) filtering using given references (optional)
  • FastQC on filtered (merged) data

Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

All tool CWL files and other workflows can be found here:
https://gitlab.com/m-unlock/cwl

How to setup and use an UNLOCK workflow:
https://m-unlock.gitlab.io/docs/setup/setup.html

Inputs

ID Name Description Type
identifier identifier used Identifier for this dataset used in this workflow
  • string
threads Number of threads Number of threads to use for computational processes
  • int?
memory Maximum memory in MB Maximum memory usage in MegaBytes
  • int?
forward_reads Forward reads Forward sequence fastq file(s) locally
  • File[]
reverse_reads Reverse reads Reverse sequence fastq file(s) locally
  • File[]
skip_qc_unfiltered Skip QC unfiltered Skip FastQC analyses of raw input reads (default false)
  • boolean?
skip_qc_filtered Skip QC filtered Skip FastQC analyses of filtered input reads (default false)
  • boolean?
filter_rrna filter rRNA Optionally remove rRNA sequences from the reads (default false)
  • boolean?
filter_references Filter reference file(s) References fasta file(s) for filtering
  • File[]?
deduplicate Deduplicate reads Remove exact duplicate reads with fastp
  • boolean?
kraken2_confidence Kraken2 confidence threshold Confidence score threshold (default 0.0) must be between [0, 1]
  • float?
kraken2_database Kraken2 database Kraken2 database location, multiple databases is possible
  • Directory[]?
kraken2_standard_report Kraken2 standard report Also output Kraken2 standard report with per read classification. These can be large. (default false)
  • boolean
keep_reference_mapped_reads Keep mapped reads Keep with reads mapped to the given reference (default false)
  • boolean?
prepare_reference Prepare references Prepare references to a single fasta file and unique headers (default true). When false a single fasta file as reference is expected with unique headers
  • boolean
step Output Step number Step number for output folder numbering (default 1)
  • int?
destination Output Destination Optional output destination only used for cwl-prov reporting.
  • string?

Steps

ID Name Description
fastqc_illumina_before FastQC before Quality assessment and report of reads
fastq_merge_fwd Merge forward reads Merge multiple forward fastq reads to a single file
fastq_merge_rev Merge reverse reads Merge multiple reverse fastq reads to a single file
fastq_fwd_array_to_file Fwd reads array to file Forward file of single file array to file object
fastq_rev_array_to_file Rev reads array to file Forward file of single file array to file object
fastp fastp Read quality filtering and (barcode) trimming.
rrna_filter rRNA filter (bbduk) Filters rRNA sequences from reads using bbduk
reference_array_to_file Reference array to file Array to file object when the reference does not need to be prepared
prepare_fasta_db Prepare references Prepare references to a single fasta file and unique headers
reference_filter_illumina Reference read mapping Map reads against references using BBMap
phix_filter PhiX filter (bbduk) Filters illumina spike-in PhiX sequences from reads using bbduk
illumina_kraken2_unfiltered Kraken2 unfiltered Taxonomic classification on unfiltered files
illumina_kraken2_filtered Kraken2 unfiltered Taxonomic classification on unfiltered files
illumina_kraken2_compress Compress kraken2 Compress large kraken2 report file
illumina_kraken2_krona Krona Kraken2 Visualization of kraken2 with Krona
fastqc_illumina_after FastQC after Quality assessment and report of reads
reports_files_to_folder Reports to folder Preparation of fastp output files to a specific output folder
kraken2_files_to_folder Kraken2 folder Kraken2 files to single folder

Outputs

ID Name Description Type
reports_folder Filtering reports folder Folder containing all reports of filtering and quality control
  • Directory
kraken2_folder Kraken2 folder Folder with Kraken2 output files
  • Directory?
QC_forward_reads Filtered forward read Filtered forward read
  • File
QC_reverse_reads Filtered reverse read Filtered reverse read
  • File

Version History

Version 1 (earliest) Created 21st Apr 2022 at 14:00 by Bart Nijsse

Initial commit


Open master 5c2e0e5
help Creators and Submitter
Discussion Channel
Activity

Views: 1511

Created: 21st Apr 2022 at 14:00

Last updated: 7th Apr 2023 at 15:02

help Attributions

None

Total size: 115 KB

Brought to you by:

Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH