Workflow Type: Common Workflow Language
Work-in-progress

Workflow (hybrid) metagenomic assembly and binning

  • Workflow Illumina Quality: https://workflowhub.eu/workflows/336?version=1
    • FastQC (control)
    • fastp (quality trimming)
    • kraken2 (taxonomy)
    • bbmap contamination filter
  • Workflow Longread Quality:
    • NanoPlot (control)
    • filtlong (quality trimming)
    • kraken2 (taxonomy)
    • minimap2 contamination filter
  • Kraken2 taxonomic classification of FASTQ reads
  • SPAdes/Flye (Assembly)
  • Pilon/Medaka/PyPolCA (Assembly polishing)
  • QUAST (Assembly quality report)

(optional)

(optional)

  • Workflow Genome-scale metabolic models from bins https://workflowhub.eu/workflows/372
    • CarveMe (GEM generation)
    • MEMOTE (GEM test suite)
    • SMETANA (Species METabolic interaction ANAlysis)

Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

All tool CWL files and other workflows can be found here:
Tools: https://gitlab.com/m-unlock/cwl/-/tree/master/cwl
Workflows: https://gitlab.com/m-unlock/cwl/-/tree/master/cwl/workflows

How to setup and use an UNLOCK workflow:
https://m-unlock.gitlab.io/docs/setup/setup.html

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
identifier Identifier Identifier for this dataset used in this workflow (required)
  • string
illumina_forward_reads Forward reads Illumina Forward sequence file(s)
  • File[]?
illumina_reverse_reads Reverse reads Illumina Reverse sequence file(s)
  • File[]?
pacbio_reads PacBio reads File(s) with PacBio reads in FASTQ format
  • File[]?
nanopore_reads Oxford Nanopore reads File(s) with Oxford Nanopore reads in FASTQ format
  • File[]?
fastq_rich Fastq rich (ONT) Input fastq is generated by albacore, MinKNOW or guppy with additional information concerning channel and time. Used to creating more informative quality plots (default false)
  • boolean
longread_minimum_length Minimum read length Minimum read length threshold (default 1000)
  • int
longread_keep_percent Keep percentage Keep only this percentage of the best reads (measured by bases) (default 90)
  • float
longread_length_weight Length weigth Weight given to the length score (default 10)
  • float
filter_references Reference file(s) Reference fasta file(s) used for pre-filtering. Can be gzipped (not mixed)
  • File[]?
use_reference_mapped_reads Keep mapped reads Continue with reads mapped to the given reference (default false)
  • boolean
keep_filtered_reads Keep filtered reads Keep filtered reads in the final output (default false)
  • boolean
deduplicate Deduplicate reads Remove exact duplicate reads Illumina reads with fastp (default false)
  • boolean
kraken2_confidence Kraken2 confidence threshold Confidence score threshold must be in [0, 1] (default 0.0)
  • float?
kraken2_database Kraken2 database Database location of kraken2
  • Directory[]?
skip_bracken Run Bracken Skip Bracken analysis. Default false.
  • boolean
gtdbtk_data gtdbtk data directory Directory containing the GTDBTK repository
  • Directory?
busco_data BUSCO dataset Path to the BUSCO dataset downloaded location
  • Directory?
ont_basecall_model ONT Basecalling model for MEDAKA Used in MEDAKA Basecalling model used with guppy default r941_min_high. Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_variant_g615, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610 (required for Medaka)
  • string?
pilon_fixlist Pilon fix list A comma-separated list of categories of issues to try to fix: "snps": try to fix individual base errors; "indels": try to fix small indels; "gaps": try to fill gaps; "local": try to detect and fix local misassemblies; "all": all of the above (default); "bases": shorthand for "snps" and "indels" (for back compatibility); default; snps,gaps,local (conservative)
  • string
genome_size Genome Size Estimated genome size (for example, 5m or 2.6g)
  • string?
metagenome When working with metagenomes Metagenome option for assemblers (default true)
  • boolean
semibin_environment SemiBin Environment SemiBin built-in models; human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/chicken_caecum/global (default global)
  • string
run_binspreader n/a Whether to use BinSPreader for bin refinement
  • boolean?
annotate_bins Annotate bins Annotate bins. Default false
  • boolean
annotate_unbinned Annotate unbinned Annotate unbinned contigs. Will be treated as metagenome. Default false
  • boolean
bakta_db Bakta DB Bakta Database directory (required when annotating bins)
  • Directory?
skip_bakta_crispr Skip bakta CRISPR Skip bakta CRISPR array prediction using PILER-CR. Default false
  • boolean
interproscan_directory InterProScan 5 directory Directory of the (full) InterProScan 5 program. Used for annotating bins. (optional)
  • Directory?
eggnog_dbs n/a n/a
  • record containing
    • Directory?
    • File?
    • File?
run_kofamscan Run kofamscan Run with KEGG KO KoFamKOALA annotation. Default false
  • boolean
kofamscan_limit_sapp SAPP kofamscan limit Limit max number of entries of kofamscan hits per locus in SAPP. Default 5
  • int?
run_eggnog Run eggNOG-mapper Run with eggNOG-mapper annotation. Requires eggnog database files. Default false
  • boolean
run_interproscan Run InterProScan Run with eggNOG-mapper annotation. Requires InterProScan v5 program files. Default false
  • boolean
interproscan_applications InterProScan applications Comma separated list of analyses: FunFam,SFLD,PANTHER,Gene3D,Hamap,PRINTS,ProSiteProfiles,Coils,SUPERFAMILY,SMART,CDD,PIRSR,ProSitePatterns,AntiFam,Pfam,MobiDBLite,PIRSF,NCBIfam default Pfam,SFLD,SMART,AntiFam,NCBIfam
  • string
run_spades Use SPAdes Run with SPAdes assembler (default true)
  • boolean
only_assembler_mode_spades Only spades assembler Run spades in only assembler mode (without read error correction) (default false)
  • boolean
run_flye Use Flye Run with Flye assembler (default false)
  • boolean
run_pilon Use Pilon Run with Pilon illumina assembly polishing (default false)
  • boolean
run_medaka Use Medaka Run with Mekada assembly polishing with nanopore reads (default false)
  • boolean
run_pypolca Use PyPolCA Run with PyPolCA assembly polishing for Long-reads with illumina data (default false)
  • boolean
assembly_choice Assembly choice User's choice of assembly for post-assembly (binning) processes ('spades', 'medaka', 'flye', 'pilon', 'pypolca'). Optional. Only one choice allowed.
  • <strong>enum</strong> of: spades, medaka, flye, pilon, pypolca
binning Run binning workflow Run with contig binning workflow (default false)
  • boolean
run_GEM Run GEM workflow Run the community GEnomescale Metabolic models workflow on bins. (default false) NOTE: Uses by default private docker containers
  • boolean
run_smetana Run SMETANA Run SMETANA (Species METabolic interaction ANAlysis) (default false)
  • boolean
smetana_solver n/a Solver to be used in SMETANA (now only run with cplex)
  • string?
memote_solver MEMOTE solver MEMOTE solver Choice (cplex, glpk, gurobi, glpk_exact); by default glpk
  • string?
gapfill Gap fill Gap fill model for given media
  • string?
mediadb Media database Media database file
  • File?
carveme_solver CarveMe solver CarveMe solver (default scip), possible to use cplex in private container (not provided in public container)
  • string?
skip_qc_unfiltered Skip QC unfiltered Skip quality analyses of unfiltered input reads (default false)
  • boolean
threads Number of threads Number of threads to use for each computational processe (default 2)
  • int
memory Memory usage (MB) Maximum memory usage in megabytes (default 8GB)
  • int
destination Output Destination (prov only) Not used in this workflow. Output destination used for cwl-prov reporting only.
  • string?

Steps

ID Name Description
prepare_fasta_db Prepare references Prepare references to a single fasta file and unique headers
workflow_quality_illumina Quality and filtering workflow Quality, filtering and taxonomic classification of Illumina reads
workflow_quality_nanopore Oxford Nanopore quality workflow Quality, filtering and taxonomic classification workflow for Oxford Nanopore reads
workflow_quality_pacbio PacBio quality and filtering workflow Quality, filtering and taxonomic classification for PacBio reads
spades SPAdes assembly Genome assembly using SPAdes with illumina and or long reads
compress_spades SPAdes compressed Compress the large Spades assembly output files
flye Flye assembly De novo assembly of single-molecule reads with Flye
medaka Medaka polishing of assembly Medaka for (ont reads) polishing of a assembled genome
metaquast_medaka assembly evaluation evaluation of polished assembly with metaQUAST
workflow_pilon Pilon worklow Illumina reads assembly polishing with Pilon
metaquast_pilon Illumina assembly evaluation Illumina evaluation of pilon polished assembly with metaQUAST
workflow_pypolca Run PyPolCA assemlby polishing PyPolCA polishing of longreads assembly with illumina reads
metaquast_pypolca Pypolca polished assembly evaluation with QUAST Run Evaluation of PyPolCA polished assembly with metaQUAST
assembly_read_mapping_illumina Minimap2 Illumina read mapping using Minimap2 on assembled scaffolds
contig_read_counts Samtools idxstats Reports alignment summary statistics
workflow_binning Binning workflow Binning workflow to create bins
workflow_GEM GEM workflow CarveMe community genomescale metabolic models workflow from bins
keep_readfilter_files_to_folder Read filtering output folder Preparation of read filtering output files to a specific output folder
readfilter_files_to_folder Read filtering output folder Preparation of read filtering output files to a specific output folder
spades_files_to_folder SPADES output to folder Preparation of SPAdes output files to a specific output folder
flye_files_to_folder Flye output folder Preparation of Flye output files to a specific output folder
metaquast_medaka_files_to_folder Medaka metaQUAST output folder Preparation of metaQUAST output files to a specific output folder
medaka_files_to_folder Medaka output folder Preparation of Medaka output files to a specific output folder
metaquast_pilon_files_to_folder Illumina metaQUAST output folder Preparation of QUAST output files to a specific output folder
pilon_files_to_folder Pilon output folder Preparation of pilon output files to a specific output folder
metaquast_pypolca_files_to_folder PyPolca metaQUAST output folder Preparation of PyPolCA metaQUAST output files to a specific output folder
pypolca_files_to_folder PyPolca output folder Preparation of PyPolCA output files to a specific output folder
assembly_files_to_folder Flye output folder Preparation of Flye output files to a specific output folder
binning_files_to_folder Binning output to folder Preparation of binning output files and folders to a specific output folder
GEM_files_to_folder GEM workflow output to folder Preparation of GEM workflow output files and folders to a specific output folder

Outputs

ID Name Description Type
read_filtering_output_keep Read filtering output Read filtering stats + filtered reads
  • Directory?
read_filtering_output Read filtering output Read filtering stats + filtered reads
  • Directory?
assembly_output Assembly output Output from different assembly steps
  • Directory
binning_output Binning output Binning outputfolders
  • Directory?
gem_output Community GEM output Community GEM output folder
  • Directory?

Version History

Version 2 (latest) Created 16th Dec 2024 at 07:46 by Bart Nijsse

No revision comments

Open master b854e66

Version 1 (earliest) Created 14th Jun 2022 at 09:14 by Bart Nijsse

Initial commit


Frozen Version-1 1e42c47
help Creators and Submitter
Discussion Channel
Activity

Views: 3102   Downloads: 300

Created: 14th Jun 2022 at 09:14

Last updated: 18th Dec 2024 at 10:14

Annotated Properties
help Attributions

None

Total size: 887 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH