Workflow Type: Common Workflow Language

Workflow (hybrid) metagenomic assembly and binning

  • Workflow Illumina Quality:
    • FastQC (control)
    • fastp (quality trimming)
    • kraken2 (taxonomy)
    • bbmap contamination filter
  • Workflow Longread Quality:
    • NanoPlot (control)
    • filtlong (quality trimming)
    • kraken2 (taxonomy)
    • minimap2 contamination filter
  • Kraken2 taxonomic classification of FASTQ reads
  • SPAdes/Flye (Assembly)
  • Pilon/Medaka/PyPolCA (Assembly polishing)
  • QUAST (Assembly quality report)



  • Workflow Genome-scale metabolic models from bins
    • CarveMe (GEM generation)
    • MEMOTE (GEM test suite)
    • SMETANA (Species METabolic interaction ANAlysis)

ID Name Description Type
identifier Identifier Identifier for this dataset used in this workflow (required)
  • string
illumina_forward_reads Forward reads Illumina Forward sequence file(s)
  • File[]?
illumina_reverse_reads Reverse reads Illumina Reverse sequence file(s)
  • File[]?
pacbio_reads PacBio reads File(s) with PacBio reads in FASTQ format
  • File[]?
nanopore_reads Oxford Nanopore reads File(s) with Oxford Nanopore reads in FASTQ format
  • File[]?
fastq_rich Fastq rich (ONT) Input fastq is generated by albacore, MinKNOW or guppy with additional information concerning channel and time. Used to creating more informative quality plots (default false)
  • boolean
longread_minimum_length Minimum read length Minimum read length threshold (default 1000)
  • int
longread_keep_percent Keep percentage Keep only this percentage of the best reads (measured by bases) (default 90)
  • float
longread_length_weight Length weigth Weight given to the length score (default 10)
  • float
filter_references Reference file(s) Reference fasta file(s) used for pre-filtering. Can be gzipped (not mixed)
  • File[]?
use_reference_mapped_reads Keep mapped reads Continue with reads mapped to the given reference (default false)
  • boolean
keep_filtered_reads Keep filtered reads Keep filtered reads in the final output (default false)
  • boolean
deduplicate Deduplicate reads Remove exact duplicate reads Illumina reads with fastp (default false)
  • boolean
kraken2_confidence Kraken2 confidence threshold Confidence score threshold must be in [0, 1] (default 0.0)
  • float?
kraken2_database Kraken2 database Database location of kraken2
  • Directory[]?
skip_bracken Run Bracken Skip Bracken analysis. Default false.
  • boolean
gtdbtk_data gtdbtk data directory Directory containing the GTDBTK repository
  • Directory?
busco_data BUSCO dataset Path to the BUSCO dataset downloaded location
  • Directory?
ont_basecall_model ONT Basecalling model for MEDAKA Used in MEDAKA Basecalling model used with guppy default r941_min_high. Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_variant_g615, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610 (required for Medaka)
  • string?
pilon_fixlist Pilon fix list A comma-separated list of categories of issues to try to fix: "snps": try to fix individual base errors; "indels": try to fix small indels; "gaps": try to fill gaps; "local": try to detect and fix local misassemblies; "all": all of the above (default); "bases": shorthand for "snps" and "indels" (for back compatibility); default; snps,gaps,local (conservative)
  • string
genome_size Genome Size Estimated genome size (for example, 5m or 2.6g)
  • string?
metagenome When working with metagenomes Metagenome option for assemblers (default true)
  • boolean
semibin_environment SemiBin Environment SemiBin built-in models; human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/chicken_caecum/global (default global)
  • string
run_binspreader n/a Whether to use BinSPreader for bin refinement
  • boolean?
annotate_bins Annotate bins Annotate bins. Default false
  • boolean
annotate_unbinned Annotate unbinned Annotate unbinned contigs. Will be treated as metagenome. Default false
  • boolean
bakta_db Bakta DB Bakta Database directory (required when annotating bins)
  • Directory?
skip_bakta_crispr Skip bakta CRISPR Skip bakta CRISPR array prediction using PILER-CR. Default false
  • boolean
interproscan_directory InterProScan 5 directory Directory of the (full) InterProScan 5 program. Used for annotating bins. (optional)
  • Directory?
eggnog_dbs n/a n/a
  • record containing
    • Directory?
    • File?
    • File?
run_kofamscan Run kofamscan Run with KEGG KO KoFamKOALA annotation. Default false
  • boolean
kofamscan_limit_sapp SAPP kofamscan limit Limit max number of entries of kofamscan hits per locus in SAPP. Default 5
  • int?
run_eggnog Run eggNOG-mapper Run with eggNOG-mapper annotation. Requires eggnog database files. Default false
  • boolean
run_interproscan Run InterProScan Run with eggNOG-mapper annotation. Requires InterProScan v5 program files. Default false
  • boolean
interproscan_applications InterProScan applications Comma separated list of analyses: FunFam,SFLD,PANTHER,Gene3D,Hamap,PRINTS,ProSiteProfiles,Coils,SUPERFAMILY,SMART,CDD,PIRSR,ProSitePatterns,AntiFam,Pfam,MobiDBLite,PIRSF,NCBIfam default Pfam,SFLD,SMART,AntiFam,NCBIfam
  • string
run_spades Use SPAdes Run with SPAdes assembler (default true)
  • boolean
only_assembler_mode_spades Only spades assembler Run spades in only assembler mode (without read error correction) (default false)
  • boolean
run_flye Use Flye Run with Flye assembler (default false)
  • boolean
run_pilon Use Pilon Run with Pilon illumina assembly polishing (default false)
  • boolean
run_medaka Use Medaka Run with Mekada assembly polishing with nanopore reads (default false)
  • boolean
run_pypolca Use PyPolCA Run with PyPolCA assembly polishing for Long-reads with illumina data (default false)
  • boolean
assembly_choice Assembly choice User's choice of assembly for post-assembly (binning) processes ('spades', 'medaka', 'flye', 'pilon', 'pypolca'). Optional. Only one choice allowed.
  • <strong>enum</strong> of: spades, medaka, flye, pilon, pypolca
binning Run binning workflow Run with contig binning workflow (default false)
  • boolean
run_GEM Run GEM workflow Run the community GEnomescale Metabolic models workflow on bins. (default false) NOTE: Uses by default private docker containers
  • boolean
run_smetana Run SMETANA Run SMETANA (Species METabolic interaction ANAlysis) (default false)
  • boolean
smetana_solver n/a Solver to be used in SMETANA (now only run with cplex)
  • string?
memote_solver MEMOTE solver MEMOTE solver Choice (cplex, glpk, gurobi, glpk_exact); by default glpk
  • string?
gapfill Gap fill Gap fill model for given media
  • string?
mediadb Media database Media database file
  • File?
carveme_solver CarveMe solver CarveMe solver (default scip), possible to use cplex in private container (not provided in public container)
  • string?
skip_qc_unfiltered Skip QC unfiltered Skip quality analyses of unfiltered input reads (default false)
  • boolean
threads Number of threads Number of threads to use for each computational processe (default 2)
  • int
memory Memory usage (MB) Maximum memory usage in megabytes (default 8GB)
  • int
destination Output Destination (prov only) Not used in this workflow. Output destination used for cwl-prov reporting only.
  • string?


ID Name Description
prepare_fasta_db Prepare references Prepare references to a single fasta file and unique headers
workflow_quality_illumina Quality and filtering workflow Quality, filtering and taxonomic classification of Illumina reads
workflow_quality_nanopore Oxford Nanopore quality workflow Quality, filtering and taxonomic classification workflow for Oxford Nanopore reads
workflow_quality_pacbio PacBio quality and filtering workflow Quality, filtering and taxonomic classification for PacBio reads
spades SPAdes assembly Genome assembly using SPAdes with illumina and or long reads
compress_spades SPAdes compressed Compress the large Spades assembly output files
flye Flye assembly De novo assembly of single-molecule reads with Flye
medaka Medaka polishing of assembly Medaka for (ont reads) polishing of a assembled genome
metaquast_medaka assembly evaluation evaluation of polished assembly with metaQUAST
workflow_pilon Pilon worklow Illumina reads assembly polishing with Pilon
metaquast_pilon Illumina assembly evaluation Illumina evaluation of pilon polished assembly with metaQUAST
workflow_pypolca Run PyPolCA assemlby polishing PyPolCA polishing of longreads assembly with illumina reads
metaquast_pypolca Pypolca polished assembly evaluation with QUAST Run Evaluation of PyPolCA polished assembly with metaQUAST
assembly_read_mapping_illumina Minimap2 Illumina read mapping using Minimap2 on assembled scaffolds
contig_read_counts Samtools idxstats Reports alignment summary statistics
workflow_binning Binning workflow Binning workflow to create bins
workflow_GEM GEM workflow CarveMe community genomescale metabolic models workflow from bins
keep_readfilter_files_to_folder Read filtering output folder Preparation of read filtering output files to a specific output folder
readfilter_files_to_folder Read filtering output folder Preparation of read filtering output files to a specific output folder
spades_files_to_folder SPADES output to folder Preparation of SPAdes output files to a specific output folder
flye_files_to_folder Flye output folder Preparation of Flye output files to a specific output folder
metaquast_medaka_files_to_folder Medaka metaQUAST output folder Preparation of metaQUAST output files to a specific output folder
medaka_files_to_folder Medaka output folder Preparation of Medaka output files to a specific output folder
metaquast_pilon_files_to_folder Illumina metaQUAST output folder Preparation of QUAST output files to a specific output folder
pilon_files_to_folder Pilon output folder Preparation of pilon output files to a specific output folder
metaquast_pypolca_files_to_folder PyPolca metaQUAST output folder Preparation of PyPolCA metaQUAST output files to a specific output folder
pypolca_files_to_folder PyPolca output folder Preparation of PyPolCA output files to a specific output folder
assembly_files_to_folder Flye output folder Preparation of Flye output files to a specific output folder
binning_files_to_folder Binning output to folder Preparation of binning output files and folders to a specific output folder
GEM_files_to_folder GEM workflow output to folder Preparation of GEM workflow output files and folders to a specific output folder


ID Name Description Type
read_filtering_output_keep Read filtering output Read filtering stats + filtered reads
  • Directory?
read_filtering_output Read filtering output Read filtering stats + filtered reads
  • Directory?
assembly_output Assembly output Output from different assembly steps
  • Directory
binning_output Binning output Binning outputfolders
  • Directory?
gem_output Community GEM output Community GEM output folder
  • Directory?

Version History

Version 2 (latest) Created 16th Dec 2024 at 07:46 by Bart Nijsse

No revision comments

Open master b854e66

Version 1 (earliest) Created 14th Jun 2022 at 09:14 by Bart Nijsse

Initial commit

Frozen Version-1 1e42c47
