Workflow Type: Common Workflow Language
Work-in-progress

Workflow for Metagenomics binning from assembly

Minimal inputs are: Identifier, assembly (fasta) and a associated sorted BAM file

Summary

  • MetaBAT2 (binning)
  • MaxBin2 (binning)
  • SemiBin (binning)
  • DAS Tool (bin merging)
  • EukRep (eukaryotic classification)
  • CheckM (bin completeness and contamination)
  • BUSCO (bin completeness)
  • GTDB-Tk (bin taxonomic classification)

Other UNLOCK workflows on WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

All tool CWL files and other workflows can be found here:
Tools: https://gitlab.com/m-unlock/cwl
Workflows: https://gitlab.com/m-unlock/cwl/workflows

How to setup and use an UNLOCK workflow:
https://m-unlock.gitlab.io/docs/setup/setup.html

Inputs

ID Name Description Type
identifier Identifier used Identifier for this dataset used in this workflow
  • string
assembly Assembly fasta Assembly in fasta format
  • File
bam_file Bam file Mapping file in sorted bam format containing reads mapped to the assembly
  • File
threads Threads Number of threads to use for computational processes
  • int?
memory memory usage (MB) Maximum memory usage in megabytes
  • int?
gtdbtk_data gtdbtk data directory Directory containing the GTDB database. When none is given GTDB-Tk will be skipped.
  • Directory?
busco_data BUSCO dataset Directory containing the BUSCO dataset location.
  • Directory?
run_semibin Run SemiBin Run with SemiBin binner
  • boolean?
semibin_environment SemiBin Environment Semibin Built-in models (human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/global)
  • string?
sub_workflow Sub workflow Run Use this when you need the output bins as File[] for subsequent analysis workflow steps in another workflow.
  • boolean
step CWL base step number Step number for order of steps
  • int?
destination Output destination (not used in the workflow itself) Optional output destination path for cwl-prov reporting.
  • string?

Steps

ID Name Description
metabat2_contig_depths contig depths MetabatContigDepths to obtain the depth file used in the MetaBat2 and SemiBin binning process
contig_read_counts samtools idxstats Reports alignment summary statistics
assembly_read_counts samtools flagstat Reports alignment summary statistics
eukrep EukRep EukRep, eukaryotic sequence classification
eukrep_stats EukRep stats EukRep fasta statistics
metabat2 MetaBAT2 binning Binning procedure using MetaBAT2
metabat2_filter_bins Filter MetBAT2 Bins Removed unwanted fasta files from the MetaBAT2 bin directory (like TooShort.fa)
metabat2_contig2bin MetaBAT2 to contig to bins List the contigs and their corresponding bin.
maxbin2 MaxBin2 binning Binning procedure using MaxBin2
maxbin2_to_folder MaxBin2 bins to folder Create folder with MaxBin2 bins
maxbin2_contig2bin MaxBin2 to contig to bins List the contigs and their corresponding bin.
semibin Sembin binning Binning procedure using SemiBin
semibin_contig2bin SemiBin to contig to bins List the contigs and their corresponding bin.
das_tool DAS Tool integrate predictions from multiple binning tools DAS Tool
das_tool_bins Bin dir to files[] DAS Tool bins folder to File array for further analysis
aggregate_bin_depths Depths per bin Depths per bin
bins_summary Bins summary Table of all bins and their statistics like size, contigs, completeness etc
bin_readstats Bin and assembly read stats Table general bin and assembly read mapping stats
checkm CheckM CheckM bin quality assessment
busco BUSCO BUSCO assembly completeness workflow
merge_busco_summaries Merge BUSCO summaries n/a
gtdbtk GTDBTK Taxomic assigment of bins with GTDB-Tk
compress_gtdbtk Compress GTDB-Tk Compress GTDB-Tk output folder
metabat2_files_to_folder MetaBAT2 output folder Preparation of MetaBAT2 output files + unbinned contigs to a specific output folder
maxbin2_files_to_folder MaxBin2 output folder Preparation of maxbin2 output files to a specific output folder.
semibin_files_to_folder SemiBin output folder Preparation of SemiBin output files to a specific output folder.
das_tool_files_to_folder DAS Tool output folder Preparation of DAS Tool output files to a specific output folder.
checkm_files_to_folder CheckM output Preparation of CheckM output files to a specific output folder
busco_files_to_folder BUSCO output folder Preparation of BUSCO output files to a specific output folder
gtdbtk_files_to_folder GTBD-Tk output folder Preparation of GTDB-Tk output files to a specific output folder
output_bin_files Bin files Bin files for subsequent workflow runs when sub_worflow = true

Outputs

ID Name Description Type
bins Bin files Bins files in fasta format. To be be used in other workflows.
  • File[]?
metabat2_output MetaBAT2 MetaBAT2 output directory
  • Directory
maxbin2_output MaxBin2 MaxBin2 output directory
  • Directory
semibin_output SemiBin MaxBin2 output directory
  • Directory?
das_tool_output DAS Tool DAS Tool output directory
  • Directory
checkm_output CheckM CheckM output directory
  • Directory
busco_output BUSCO BUSCO output directory
  • Directory
gtdbtk_output GTDB-Tk GTDB-Tk output directory
  • Directory?
bins_summary_table Bins summary Summary of info about the bins
  • File
bins_read_stats Assembly/Bin read stats General assembly and bin coverage
  • File
eukrep_fasta EukRep fasta EukRep eukaryotic classified contigs
  • File
eukrep_stats_file EukRep stats EukRep fasta statistics
  • File

Version History

Version 11 (latest) Created 18th Oct 2021 at 10:49 by Jasper Koehorst

Added more binning and assembly reports


Open master d4b13ee

Version 10 Created 7th Jun 2021 at 18:34 by Jasper Koehorst

No revision comments

Frozen master c2519b1

Version 9 Created 1st Jun 2021 at 11:43 by Jasper Koehorst

No revision comments

Frozen master d6fcbfa

Version 8 Created 6th May 2021 at 07:03 by Jasper Koehorst

No revision comments

Frozen master 0660405

Version 7 Created 8th Jan 2021 at 10:15 by Jasper Koehorst

No revision comments

Frozen master f3919f2
help Creators and Submitter
Discussion Channel
Activity

Views: 3998   Downloads: 81

Created: 15th Oct 2020 at 14:55

Last updated: 2nd Nov 2022 at 15:29

Last used: 4th Dec 2022 at 20:43

EDAM Properties
Topics
Operations
help Attributions

None

Total size: 18.5 KB
Powered by
(v.1.12.3)
Copyright © 2008 - 2022 The University of Manchester and HITS gGmbH

By continuing to use this site you agree to the use of cookies