Workflow Type: Galaxy
Stable

Post-genome assembly quality control workflow using Quast, BUSCO, Meryl, Merqury and Fasta Statistics, with updates November 2024.

Workflow inputs: reads as fastqsanger.gz (not fastq.gz), and primary assembly.fasta. (To change reads format: click on the pencil icon next to the file in the Galaxy history, then "Datatypes", then set "New type" as fastqsanger.gz). Note: the reads should be those that were used for the assembly (i.e., the filtered/cleaned reads), not the raw reads.

What it does: Computes read coverage. Runs Quast. Runs Fasta Statistics. Runs Meryl and Merqury. Runs Busco. (New default settings for BUSCO: lineage = eukaryota; for Quast: lineage = eukaryotes, genome = large.)

Workflow outputs: Reports assembly stats into a table called metrics.tsv, including selected metrics from Fasta Stats, and read coverage; reports BUSCO versions and dependencies; and displays these tables in the workflow report.

Note: a known bug is that sometimes the workflow report text resets to default text.

To check and restore: open the workflow in Galaxy for editing.

Click on the "Edit Report" icon (top right, pencil icon).

Copy and paste the following text into the workflow report, then exit this report page, then save the workflow.

Workflow Execution Report

Workflow name: Genome assessment post assembly

Genome assembly metrics

Selected statistics from the workflow outputs. Additional metrics are available in other outputs in the history.

history_dataset_display(output="Genome assembly metrics")

Software

Busco version and dependencies:

history_dataset_display(output="Busco and dependencies version")

Galaxy Australia

Thanks for using Galaxy! When you use Galaxy Australia to support your publication or project, please acknowledge its use with the following statement: "This work is supported by Galaxy Australia, a service provided by the Australian Biocommons and its partners. The service receives NCRIS funding through Bioplatforms Australia and the Australian Research Data Commons (https://doi.org/10.47486/PL105), as well as The University of Melbourne and Queensland Government RICF funding."

Inputs

ID Name Description Type
FASTA contigs - Primary Assembly #main/FASTA contigs - Primary Assembly n/a
  • File
Reads that were used for the assembly #main/Reads that were used for the assembly n/a
  • File

Steps

ID Name Description
2 FASTQ to FASTA toolshed.g2.bx.psu.edu/repos/devteam/fastqtofasta/fastq_to_fasta_python/1.1.5
3 Meryl toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6
4 Fasta Statistics toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0
5 Quast toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1
6 Busco toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.4.6+galaxy0
7 Fasta Statistics toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0
8 Merqury toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3
9 Search in textfiles toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
10 Relabel some items in Fasta stats toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
11 Get required Busco stats toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
12 Get Busco version toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
13 Get Busco dependencies toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
14 Search in textfiles toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1
15 Cut Cut1
16 Filter out unneeded lines from fasta stats toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
17 Rename some items and add in delimiters for later toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
18 Reformat some text toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1
19 Cut Cut1
20 Extract assembly size toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
21 Extract number of contigs toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
22 Extract Contig N and L 50s and 90s toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
23 Extract longest contig toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
24 Extract GC content toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0
25 Convert commas to tabs Convert characters1
26 Collate Busco info cat1
27 Paste Paste1
28 Add blank header toolshed.g2.bx.psu.edu/repos/bgruening/add_line_to_file/add_line_to_file/0.1.0
29 Transpose cols to rows toolshed.g2.bx.psu.edu/repos/iuc/datamash_transpose/datamash_transpose/1.8+galaxy0
30 Convert to table Convert characters1
31 Compute coverage, total reads length divided by assembly length toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0
32 Convert underscores to tabs Convert characters1
33 Keep two columns Cut1
34 Round the percentage to 2 decimal places toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0
35 Label the column toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/1.1.3
36 Join info into one table cat1

Outputs

ID Name Description Type
Busco and dependencies version #main/Busco and dependencies version n/a
  • File
Busco on input dataset(s): full table #main/Busco on input dataset(s): full table n/a
  • File
Fasta Statistics on input dataset(s): summary stats #main/Fasta Statistics on input dataset(s): summary stats n/a
  • File
Genome assembly metrics #main/Genome assembly metrics n/a
  • File
Genome coverage #main/Genome coverage n/a
  • File
Merqury on input dataset(s): bed #main/Merqury on input dataset(s): bed n/a
  • File
Merqury on input dataset(s): png #main/Merqury on input dataset(s): png n/a
  • File
Merqury on input dataset(s): qv #main/Merqury on input dataset(s): qv n/a
  • File
Merqury on input dataset(s): size files #main/Merqury on input dataset(s): size files n/a
  • File
Merqury on input dataset(s): stats #main/Merqury on input dataset(s): stats n/a
  • File
Merqury on input dataset(s): wig #main/Merqury on input dataset(s): wig n/a
  • File
Meryl on input dataset(s): read-db.meryldb #main/Meryl on input dataset(s): read-db.meryldb n/a
  • File
Quast on input dataset(s): HTML report #main/Quast on input dataset(s): HTML report n/a
  • File
Quast on input dataset(s): PDF report #main/Quast on input dataset(s): PDF report n/a
  • File
Quast on input dataset(s): Log #main/Quast on input dataset(s): Log n/a
  • File
Quast on input dataset(s): tabular report #main/Quast on input dataset(s): tabular report n/a
  • File
_anonymous_output_1 #main/_anonymous_output_1 n/a
  • File
_anonymous_output_10 #main/_anonymous_output_10 n/a
  • File
_anonymous_output_11 #main/_anonymous_output_11 n/a
  • File
_anonymous_output_12 #main/_anonymous_output_12 n/a
  • File
_anonymous_output_13 #main/_anonymous_output_13 n/a
  • File
_anonymous_output_14 #main/_anonymous_output_14 n/a
  • File
_anonymous_output_15 #main/_anonymous_output_15 n/a
  • File
_anonymous_output_16 #main/_anonymous_output_16 n/a
  • File
_anonymous_output_17 #main/_anonymous_output_17 n/a
  • File
_anonymous_output_18 #main/_anonymous_output_18 n/a
  • File
_anonymous_output_19 #main/_anonymous_output_19 n/a
  • File
_anonymous_output_2 #main/_anonymous_output_2 n/a
  • File
_anonymous_output_20 #main/_anonymous_output_20 n/a
  • File
_anonymous_output_21 #main/_anonymous_output_21 n/a
  • File
_anonymous_output_22 #main/_anonymous_output_22 n/a
  • File
_anonymous_output_23 #main/_anonymous_output_23 n/a
  • File
_anonymous_output_24 #main/_anonymous_output_24 n/a
  • File
_anonymous_output_25 #main/_anonymous_output_25 n/a
  • File
_anonymous_output_26 #main/_anonymous_output_26 n/a
  • File
_anonymous_output_3 #main/_anonymous_output_3 n/a
  • File
_anonymous_output_4 #main/_anonymous_output_4 n/a
  • File
_anonymous_output_5 #main/_anonymous_output_5 n/a
  • File
_anonymous_output_6 #main/_anonymous_output_6 n/a
  • File
_anonymous_output_7 #main/_anonymous_output_7 n/a
  • File
_anonymous_output_8 #main/_anonymous_output_8 n/a
  • File
_anonymous_output_9 #main/_anonymous_output_9 n/a
  • File
out_file1 #main/out_file1 n/a
  • File
outfile #main/outfile n/a
  • File

Version History

v2.0.6 (latest) Created 4th Dec 2024 at 07:02 by Anna Syme

Merge pull request #10 from AnnaSyme/wf-name-change

Rename updated-Galaxy-Workflow-Genome_assessment_post_assembly.ga to …


Frozen master 0154e28

v2.0.6 - ignore Created 4th Dec 2024 at 02:15 by Anna Syme

Merge pull request #10 from AnnaSyme/wf-name-change

Rename updated-Galaxy-Workflow-Genome_assessment_post_assembly.ga to …


Frozen master 0154e28

v2.0.5 Created 6th Aug 2024 at 11:04 by Anna Syme

Merge pull request #7 from AustralianBioCommons/supernord-workflow-name-fix

Update workflow name in ro-crate-metadata.json


Frozen v2.0.5 fe2213b

v2.0.4 Created 19th Apr 2024 at 02:56 by Anna Syme

Merge pull request #7 from AustralianBioCommons/supernord-workflow-name-fix

Update workflow name in ro-crate-metadata.json


Frozen v2.0.4 fe2213b

v2.0.2 Created 16th Apr 2024 at 08:19 by Johan Gustafsson

Update .lifemonitor.yaml


Frozen v2.0.2 4ad99a2

v1.1.0 Created 9th May 2023 at 01:59 by Johan Gustafsson

Add missing raw data input


Frozen v1.1.0 46d8253

v1.0.0 (earliest) Created 7th Nov 2022 at 07:10 by Johan Gustafsson

Update links


Frozen v1.0.0 efaf002
help Creators and Submitter
Creators
Submitter
Citation
Price, G., Syme, A., Price, G., & Syme, A. (2024). Genome-assessment-post-assembly. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.403.7
Activity

Views: 5844   Downloads: 740   Runs: 6

Created: 7th Nov 2022 at 07:10

Last updated: 4th Dec 2024 at 02:15

help Attributions

None

Total size: 3.59 MB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH