Workflow Type: Common Workflow Language
Open
Frozen
Work-in-progress
Workflow (hybrid) metagenomic assembly and binning
- Workflow Illumina Quality: https://workflowhub.eu/workflows/336?version=1
- FastQC (control)
- fastp (quality trimming)
- kraken2 (taxonomy)
- bbmap contamination filter
- Workflow Longread Quality:
- NanoPlot (control)
- filtlong (quality trimming)
- kraken2 (taxonomy)
- minimap2 contamination filter
- Kraken2 taxonomic classification of FASTQ reads
- SPAdes/Flye (Assembly)
- Pilon/Medaka/PyPolCA (Assembly polishing)
- QUAST (Assembly quality report)
(optional)
- Workflow binnning https://workflowhub.eu/workflows/64?version=11
- Metabat2/MaxBin2/SemiBin
- DAS Tool
- CheckM
- BUSCO
- GTDB-Tk
- (optional)
- Workflow bin annotation (https://workflowhub.eu/workflows/1170)
- bakta
- KoFam scan (optional)
- Interpro Scan(optional)
- eggNOG mapper (optional)
- Workflow SAPP conversion (optional, default on) (https://workflowhub.eu/workflows/1174/)
(optional)
- Workflow Genome-scale metabolic models from bins https://workflowhub.eu/workflows/372
- CarveMe (GEM generation)
- MEMOTE (GEM test suite)
- SMETANA (Species METabolic interaction ANAlysis)
Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default
All tool CWL files and other workflows can be found here:
Tools: https://gitlab.com/m-unlock/cwl/-/tree/master/cwl
Workflows: https://gitlab.com/m-unlock/cwl/-/tree/master/cwl/workflows
How to setup and use an UNLOCK workflow:
https://m-unlock.gitlab.io/docs/setup/setup.html
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
identifier | Identifier | Identifier for this dataset used in this workflow (required) |
|
illumina_forward_reads | Forward reads | Illumina Forward sequence file(s) |
|
illumina_reverse_reads | Reverse reads | Illumina Reverse sequence file(s) |
|
pacbio_reads | PacBio reads | File(s) with PacBio reads in FASTQ format |
|
nanopore_reads | Oxford Nanopore reads | File(s) with Oxford Nanopore reads in FASTQ format |
|
fastq_rich | Fastq rich (ONT) | Input fastq is generated by albacore, MinKNOW or guppy with additional information concerning channel and time. Used to creating more informative quality plots (default false) |
|
longread_minimum_length | Minimum read length | Minimum read length threshold (default 1000) |
|
longread_keep_percent | Keep percentage | Keep only this percentage of the best reads (measured by bases) (default 90) |
|
longread_length_weight | Length weigth | Weight given to the length score (default 10) |
|
filter_references | Reference file(s) | Reference fasta file(s) used for pre-filtering. Can be gzipped (not mixed) |
|
use_reference_mapped_reads | Keep mapped reads | Continue with reads mapped to the given reference (default false) |
|
keep_filtered_reads | Keep filtered reads | Keep filtered reads in the final output (default false) |
|
deduplicate | Deduplicate reads | Remove exact duplicate reads Illumina reads with fastp (default false) |
|
kraken2_confidence | Kraken2 confidence threshold | Confidence score threshold must be in [0, 1] (default 0.0) |
|
kraken2_database | Kraken2 database | Database location of kraken2 |
|
skip_bracken | Run Bracken | Skip Bracken analysis. Default false. |
|
gtdbtk_data | gtdbtk data directory | Directory containing the GTDBTK repository |
|
busco_data | BUSCO dataset | Path to the BUSCO dataset downloaded location |
|
ont_basecall_model | ONT Basecalling model for MEDAKA | Used in MEDAKA Basecalling model used with guppy default r941_min_high. Available: r103_fast_g507, r103_fast_snp_g507, r103_fast_variant_g507, r103_hac_g507, r103_hac_snp_g507, r103_hac_variant_g507, r103_min_high_g345, r103_min_high_g360, r103_prom_high_g360, r103_prom_snp_g3210, r103_prom_variant_g3210, r103_sup_g507, r103_sup_snp_g507, r103_sup_variant_g507, r1041_e82_400bps_fast_g615, r1041_e82_400bps_fast_variant_g615, r1041_e82_400bps_hac_g615, r1041_e82_400bps_hac_variant_g615, r1041_e82_400bps_sup_g615, r1041_e82_400bps_sup_variant_g615, r104_e81_fast_g5015, r104_e81_fast_variant_g5015, r104_e81_hac_g5015, r104_e81_hac_variant_g5015, r104_e81_sup_g5015, r104_e81_sup_g610, r104_e81_sup_variant_g610, r10_min_high_g303, r10_min_high_g340, r941_e81_fast_g514, r941_e81_fast_variant_g514, r941_e81_hac_g514, r941_e81_hac_variant_g514, r941_e81_sup_g514, r941_e81_sup_variant_g514, r941_min_fast_g303, r941_min_fast_g507, r941_min_fast_snp_g507, r941_min_fast_variant_g507, r941_min_hac_g507, r941_min_hac_snp_g507, r941_min_hac_variant_g507, r941_min_high_g303, r941_min_high_g330, r941_min_high_g340_rle, r941_min_high_g344, r941_min_high_g351, r941_min_high_g360, r941_min_sup_g507, r941_min_sup_snp_g507, r941_min_sup_variant_g507, r941_prom_fast_g303, r941_prom_fast_g507, r941_prom_fast_snp_g507, r941_prom_fast_variant_g507, r941_prom_hac_g507, r941_prom_hac_snp_g507, r941_prom_hac_variant_g507, r941_prom_high_g303, r941_prom_high_g330, r941_prom_high_g344, r941_prom_high_g360, r941_prom_high_g4011, r941_prom_snp_g303, r941_prom_snp_g322, r941_prom_snp_g360, r941_prom_sup_g507, r941_prom_sup_snp_g507, r941_prom_sup_variant_g507, r941_prom_variant_g303, r941_prom_variant_g322, r941_prom_variant_g360, r941_sup_plant_g610, r941_sup_plant_variant_g610 (required for Medaka) |
|
pilon_fixlist | Pilon fix list | A comma-separated list of categories of issues to try to fix: "snps": try to fix individual base errors; "indels": try to fix small indels; "gaps": try to fill gaps; "local": try to detect and fix local misassemblies; "all": all of the above (default); "bases": shorthand for "snps" and "indels" (for back compatibility); default; snps,gaps,local (conservative) |
|
genome_size | Genome Size | Estimated genome size (for example, 5m or 2.6g) |
|
metagenome | When working with metagenomes | Metagenome option for assemblers (default true) |
|
semibin_environment | SemiBin Environment | SemiBin built-in models; human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/chicken_caecum/global (default global) |
|
run_binspreader | n/a | Whether to use BinSPreader for bin refinement |
|
annotate_bins | Annotate bins | Annotate bins. Default false |
|
annotate_unbinned | Annotate unbinned | Annotate unbinned contigs. Will be treated as metagenome. Default false |
|
bakta_db | Bakta DB | Bakta Database directory (required when annotating bins) |
|
skip_bakta_crispr | Skip bakta CRISPR | Skip bakta CRISPR array prediction using PILER-CR. Default false |
|
interproscan_directory | InterProScan 5 directory | Directory of the (full) InterProScan 5 program. Used for annotating bins. (optional) |
|
eggnog_dbs | n/a | n/a |
|
run_kofamscan | Run kofamscan | Run with KEGG KO KoFamKOALA annotation. Default false |
|
kofamscan_limit_sapp | SAPP kofamscan limit | Limit max number of entries of kofamscan hits per locus in SAPP. Default 5 |
|
run_eggnog | Run eggNOG-mapper | Run with eggNOG-mapper annotation. Requires eggnog database files. Default false |
|
run_interproscan | Run InterProScan | Run with eggNOG-mapper annotation. Requires InterProScan v5 program files. Default false |
|
interproscan_applications | InterProScan applications | Comma separated list of analyses: FunFam,SFLD,PANTHER,Gene3D,Hamap,PRINTS,ProSiteProfiles,Coils,SUPERFAMILY,SMART,CDD,PIRSR,ProSitePatterns,AntiFam,Pfam,MobiDBLite,PIRSF,NCBIfam default Pfam,SFLD,SMART,AntiFam,NCBIfam |
|
run_spades | Use SPAdes | Run with SPAdes assembler (default true) |
|
only_assembler_mode_spades | Only spades assembler | Run spades in only assembler mode (without read error correction) (default false) |
|
run_flye | Use Flye | Run with Flye assembler (default false) |
|
run_pilon | Use Pilon | Run with Pilon illumina assembly polishing (default false) |
|
run_medaka | Use Medaka | Run with Mekada assembly polishing with nanopore reads (default false) |
|
run_pypolca | Use PyPolCA | Run with PyPolCA assembly polishing for Long-reads with illumina data (default false) |
|
assembly_choice | Assembly choice | User's choice of assembly for post-assembly (binning) processes ('spades', 'medaka', 'flye', 'pilon', 'pypolca'). Optional. Only one choice allowed. |
|
binning | Run binning workflow | Run with contig binning workflow (default false) |
|
run_GEM | Run GEM workflow | Run the community GEnomescale Metabolic models workflow on bins. (default false) NOTE: Uses by default private docker containers |
|
run_smetana | Run SMETANA | Run SMETANA (Species METabolic interaction ANAlysis) (default false) |
|
smetana_solver | n/a | Solver to be used in SMETANA (now only run with cplex) |
|
memote_solver | MEMOTE solver | MEMOTE solver Choice (cplex, glpk, gurobi, glpk_exact); by default glpk |
|
gapfill | Gap fill | Gap fill model for given media |
|
mediadb | Media database | Media database file |
|
carveme_solver | CarveMe solver | CarveMe solver (default scip), possible to use cplex in private container (not provided in public container) |
|
skip_qc_unfiltered | Skip QC unfiltered | Skip quality analyses of unfiltered input reads (default false) |
|
threads | Number of threads | Number of threads to use for each computational processe (default 2) |
|
memory | Memory usage (MB) | Maximum memory usage in megabytes (default 8GB) |
|
destination | Output Destination (prov only) | Not used in this workflow. Output destination used for cwl-prov reporting only. |
|
Steps
ID | Name | Description |
---|---|---|
prepare_fasta_db | Prepare references | Prepare references to a single fasta file and unique headers |
workflow_quality_illumina | Quality and filtering workflow | Quality, filtering and taxonomic classification of Illumina reads |
workflow_quality_nanopore | Oxford Nanopore quality workflow | Quality, filtering and taxonomic classification workflow for Oxford Nanopore reads |
workflow_quality_pacbio | PacBio quality and filtering workflow | Quality, filtering and taxonomic classification for PacBio reads |
spades | SPAdes assembly | Genome assembly using SPAdes with illumina and or long reads |
compress_spades | SPAdes compressed | Compress the large Spades assembly output files |
flye | Flye assembly | De novo assembly of single-molecule reads with Flye |
medaka | Medaka polishing of assembly | Medaka for (ont reads) polishing of a assembled genome |
metaquast_medaka | assembly evaluation | evaluation of polished assembly with metaQUAST |
workflow_pilon | Pilon worklow | Illumina reads assembly polishing with Pilon |
metaquast_pilon | Illumina assembly evaluation | Illumina evaluation of pilon polished assembly with metaQUAST |
workflow_pypolca | Run PyPolCA assemlby polishing | PyPolCA polishing of longreads assembly with illumina reads |
metaquast_pypolca | Pypolca polished assembly evaluation with QUAST | Run Evaluation of PyPolCA polished assembly with metaQUAST |
assembly_read_mapping_illumina | Minimap2 | Illumina read mapping using Minimap2 on assembled scaffolds |
contig_read_counts | Samtools idxstats | Reports alignment summary statistics |
workflow_binning | Binning workflow | Binning workflow to create bins |
workflow_GEM | GEM workflow | CarveMe community genomescale metabolic models workflow from bins |
keep_readfilter_files_to_folder | Read filtering output folder | Preparation of read filtering output files to a specific output folder |
readfilter_files_to_folder | Read filtering output folder | Preparation of read filtering output files to a specific output folder |
spades_files_to_folder | SPADES output to folder | Preparation of SPAdes output files to a specific output folder |
flye_files_to_folder | Flye output folder | Preparation of Flye output files to a specific output folder |
metaquast_medaka_files_to_folder | Medaka metaQUAST output folder | Preparation of metaQUAST output files to a specific output folder |
medaka_files_to_folder | Medaka output folder | Preparation of Medaka output files to a specific output folder |
metaquast_pilon_files_to_folder | Illumina metaQUAST output folder | Preparation of QUAST output files to a specific output folder |
pilon_files_to_folder | Pilon output folder | Preparation of pilon output files to a specific output folder |
metaquast_pypolca_files_to_folder | PyPolca metaQUAST output folder | Preparation of PyPolCA metaQUAST output files to a specific output folder |
pypolca_files_to_folder | PyPolca output folder | Preparation of PyPolCA output files to a specific output folder |
assembly_files_to_folder | Flye output folder | Preparation of Flye output files to a specific output folder |
binning_files_to_folder | Binning output to folder | Preparation of binning output files and folders to a specific output folder |
GEM_files_to_folder | GEM workflow output to folder | Preparation of GEM workflow output files and folders to a specific output folder |
Outputs
ID | Name | Description | Type |
---|---|---|---|
read_filtering_output_keep | Read filtering output | Read filtering stats + filtered reads |
|
read_filtering_output | Read filtering output | Read filtering stats + filtered reads |
|
assembly_output | Assembly output | Output from different assembly steps |
|
binning_output | Binning output | Binning outputfolders |
|
gem_output | Community GEM output | Community GEM output folder |
|
Version History
Version 2 (latest) Created 16th Dec 2024 at 07:46 by Bart Nijsse
No revision comments
Open
master
b854e66
Version 1 (earliest) Created 14th Jun 2022 at 09:14 by Bart Nijsse
Initial commit
Frozen
Version-1
1e42c47
Creators and Submitter
Creators
Submitter
Discussion Channel
Tools
License
Activity
Views: 3102 Downloads: 300
Created: 14th Jun 2022 at 09:14
Last updated: 18th Dec 2024 at 10:14
Annotated Properties
Topic annotations
Operation annotations
Attributions
None