Post-genome assembly quality control workflow using Quast, BUSCO, Meryl, Merqury and Fasta Statistics, with updates November 2024.
Workflow inputs: reads as fastqsanger.gz (not fastq.gz), and primary assembly.fasta. (To change reads format: click on the pencil icon next to the file in the Galaxy history, then "Datatypes", then set "New type" as fastqsanger.gz). Note: the reads should be those that were used for the assembly (i.e., the filtered/cleaned reads), not the raw reads.
What it does: Computes read coverage. Runs Quast. Runs Fasta Statistics. Runs Meryl and Merqury. Runs Busco. (New default settings for BUSCO: lineage = eukaryota; for Quast: lineage = eukaryotes, genome = large.)
Workflow outputs: Reports assembly stats into a table called metrics.tsv, including selected metrics from Fasta Stats, and read coverage; reports BUSCO versions and dependencies; and displays these tables in the workflow report.
Note: a known bug is that sometimes the workflow report text resets to default text.
To check and restore: open the workflow in Galaxy for editing.
Click on the "Edit Report" icon (top right, pencil icon).
Copy and paste the following text into the workflow report, then exit this report page, then save the workflow.
Workflow Execution Report
Workflow name: Genome assessment post assembly
Genome assembly metrics
Selected statistics from the workflow outputs. Additional metrics are available in other outputs in the history.
history_dataset_display(output="Genome assembly metrics")
Software
Busco version and dependencies:
history_dataset_display(output="Busco and dependencies version")
Galaxy Australia
Thanks for using Galaxy! When you use Galaxy Australia to support your publication or project, please acknowledge its use with the following statement: "This work is supported by Galaxy Australia, a service provided by the Australian Biocommons and its partners. The service receives NCRIS funding through Bioplatforms Australia and the Australian Research Data Commons (https://doi.org/10.47486/PL105), as well as The University of Melbourne and Queensland Government RICF funding."
Inputs
ID | Name | Description | Type |
---|---|---|---|
FASTA contigs - Primary Assembly | #main/FASTA contigs - Primary Assembly | n/a |
|
Reads that were used for the assembly | #main/Reads that were used for the assembly | n/a |
|
Steps
ID | Name | Description |
---|---|---|
2 | FASTQ to FASTA | toolshed.g2.bx.psu.edu/repos/devteam/fastqtofasta/fastq_to_fasta_python/1.1.5 |
3 | Meryl | toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6 |
4 | Fasta Statistics | toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0 |
5 | Quast | toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 |
6 | Busco | toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.4.6+galaxy0 |
7 | Fasta Statistics | toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0 |
8 | Merqury | toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 |
9 | Search in textfiles | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
10 | Relabel some items in Fasta stats | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1 |
11 | Get required Busco stats | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
12 | Get Busco version | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
13 | Get Busco dependencies | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
14 | Search in textfiles | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
15 | Cut | Cut1 |
16 | Filter out unneeded lines from fasta stats | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
17 | Rename some items and add in delimiters for later | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1 |
18 | Reformat some text | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1 |
19 | Cut | Cut1 |
20 | Extract assembly size | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
21 | Extract number of contigs | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
22 | Extract Contig N and L 50s and 90s | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
23 | Extract longest contig | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
24 | Extract GC content | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
25 | Convert commas to tabs | Convert characters1 |
26 | Collate Busco info | cat1 |
27 | Paste | Paste1 |
28 | Add blank header | toolshed.g2.bx.psu.edu/repos/bgruening/add_line_to_file/add_line_to_file/0.1.0 |
29 | Transpose cols to rows | toolshed.g2.bx.psu.edu/repos/iuc/datamash_transpose/datamash_transpose/1.8+galaxy0 |
30 | Convert to table | Convert characters1 |
31 | Compute coverage, total reads length divided by assembly length | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
32 | Convert underscores to tabs | Convert characters1 |
33 | Keep two columns | Cut1 |
34 | Round the percentage to 2 decimal places | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
35 | Label the column | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/1.1.3 |
36 | Join info into one table | cat1 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
Busco and dependencies version | #main/Busco and dependencies version | n/a |
|
Busco on input dataset(s): full table | #main/Busco on input dataset(s): full table | n/a |
|
Fasta Statistics on input dataset(s): summary stats | #main/Fasta Statistics on input dataset(s): summary stats | n/a |
|
Genome assembly metrics | #main/Genome assembly metrics | n/a |
|
Genome coverage | #main/Genome coverage | n/a |
|
Merqury on input dataset(s): bed | #main/Merqury on input dataset(s): bed | n/a |
|
Merqury on input dataset(s): png | #main/Merqury on input dataset(s): png | n/a |
|
Merqury on input dataset(s): qv | #main/Merqury on input dataset(s): qv | n/a |
|
Merqury on input dataset(s): size files | #main/Merqury on input dataset(s): size files | n/a |
|
Merqury on input dataset(s): stats | #main/Merqury on input dataset(s): stats | n/a |
|
Merqury on input dataset(s): wig | #main/Merqury on input dataset(s): wig | n/a |
|
Meryl on input dataset(s): read-db.meryldb | #main/Meryl on input dataset(s): read-db.meryldb | n/a |
|
Quast on input dataset(s): HTML report | #main/Quast on input dataset(s): HTML report | n/a |
|
Quast on input dataset(s): PDF report | #main/Quast on input dataset(s): PDF report | n/a |
|
Quast on input dataset(s): Log | #main/Quast on input dataset(s): Log | n/a |
|
Quast on input dataset(s): tabular report | #main/Quast on input dataset(s): tabular report | n/a |
|
_anonymous_output_1 | #main/_anonymous_output_1 | n/a |
|
_anonymous_output_10 | #main/_anonymous_output_10 | n/a |
|
_anonymous_output_11 | #main/_anonymous_output_11 | n/a |
|
_anonymous_output_12 | #main/_anonymous_output_12 | n/a |
|
_anonymous_output_13 | #main/_anonymous_output_13 | n/a |
|
_anonymous_output_14 | #main/_anonymous_output_14 | n/a |
|
_anonymous_output_15 | #main/_anonymous_output_15 | n/a |
|
_anonymous_output_16 | #main/_anonymous_output_16 | n/a |
|
_anonymous_output_17 | #main/_anonymous_output_17 | n/a |
|
_anonymous_output_18 | #main/_anonymous_output_18 | n/a |
|
_anonymous_output_19 | #main/_anonymous_output_19 | n/a |
|
_anonymous_output_2 | #main/_anonymous_output_2 | n/a |
|
_anonymous_output_20 | #main/_anonymous_output_20 | n/a |
|
_anonymous_output_21 | #main/_anonymous_output_21 | n/a |
|
_anonymous_output_22 | #main/_anonymous_output_22 | n/a |
|
_anonymous_output_23 | #main/_anonymous_output_23 | n/a |
|
_anonymous_output_24 | #main/_anonymous_output_24 | n/a |
|
_anonymous_output_25 | #main/_anonymous_output_25 | n/a |
|
_anonymous_output_26 | #main/_anonymous_output_26 | n/a |
|
_anonymous_output_3 | #main/_anonymous_output_3 | n/a |
|
_anonymous_output_4 | #main/_anonymous_output_4 | n/a |
|
_anonymous_output_5 | #main/_anonymous_output_5 | n/a |
|
_anonymous_output_6 | #main/_anonymous_output_6 | n/a |
|
_anonymous_output_7 | #main/_anonymous_output_7 | n/a |
|
_anonymous_output_8 | #main/_anonymous_output_8 | n/a |
|
_anonymous_output_9 | #main/_anonymous_output_9 | n/a |
|
out_file1 | #main/out_file1 | n/a |
|
outfile | #main/outfile | n/a |
|
Version History
v2.0.6 (latest) Created 4th Dec 2024 at 07:02 by Anna Syme
Merge pull request #10 from AnnaSyme/wf-name-change
Rename updated-Galaxy-Workflow-Genome_assessment_post_assembly.ga to …
Frozen
master
0154e28
v2.0.6 - ignore Created 4th Dec 2024 at 02:15 by Anna Syme
Merge pull request #10 from AnnaSyme/wf-name-change
Rename updated-Galaxy-Workflow-Genome_assessment_post_assembly.ga to …
Frozen
master
0154e28
v2.0.2 Created 16th Apr 2024 at 08:19 by Johan Gustafsson
Update .lifemonitor.yaml
Frozen
v2.0.2
4ad99a2
v1.1.0 Created 9th May 2023 at 01:59 by Johan Gustafsson
Add missing raw data input
Frozen
v1.1.0
46d8253
v1.0.0 (earliest) Created 7th Nov 2022 at 07:10 by Johan Gustafsson
Update links
Frozen
v1.0.0
efaf002
Creators
Submitter
Views: 5844 Downloads: 740 Runs: 6
Created: 7th Nov 2022 at 07:10
Last updated: 4th Dec 2024 at 02:15
None