Post-genome assembly quality control workflow using Quast, BUSCO, Meryl, Merqury and Fasta Statistics. Updates November 2023. Inputs: reads as fastqsanger.gz (not fastq.gz), and assembly.fasta. New default settings for BUSCO: lineage = eukaryota; for Quast: lineage = eukaryotes, genome = large. Reports assembly stats into a table called metrics.tsv, including selected metrics from Fasta Stats, and read coverage; reports BUSCO versions and dependencies; and displays these tables in the workflow report. Note: a known bug is that sometimes the workflow report text resets to default text. To restore, look for an earlier workflow version with correct workflow report text, and copy and paste report text into current version.
Inputs
ID | Name | Description | Type |
---|---|---|---|
FASTA contigs - Primary Assembly | #main/FASTA contigs - Primary Assembly | n/a |
|
Raw reads | #main/Raw reads | n/a |
|
Steps
ID | Name | Description |
---|---|---|
2 | FASTQ to FASTA | toolshed.g2.bx.psu.edu/repos/devteam/fastqtofasta/fastq_to_fasta_python/1.1.5 |
3 | Meryl | toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6 |
4 | Fasta Statistics | toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0 |
5 | Quast | toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1 |
6 | Busco | toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.4.6+galaxy0 |
7 | Fasta Statistics | toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0 |
8 | Merqury | toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3 |
9 | Search in textfiles | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
10 | Relabel some items in Fasta stats | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1 |
11 | Get required Busco stats | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
12 | Get Busco version | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
13 | Get Busco dependencies | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
14 | Search in textfiles | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1 |
15 | Cut | Cut1 |
16 | Filter out unneeded lines from fasta stats | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
17 | Rename some items and add in delimiters for later | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1 |
18 | Reformat some text | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sed_tool/1.1.1 |
19 | Cut | Cut1 |
20 | Extract assembly size | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
21 | Extract number of contigs | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
22 | Extract Contig N and L 50s and 90s | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
23 | Extract longest contig | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
24 | Extract GC content | toolshed.g2.bx.psu.edu/repos/iuc/filter_tabular/filter_tabular/3.3.0 |
25 | Convert commas to tabs | Convert characters1 |
26 | Collate Busco info | cat1 |
27 | Paste | Paste1 |
28 | Add blank header | toolshed.g2.bx.psu.edu/repos/bgruening/add_line_to_file/add_line_to_file/0.1.0 |
29 | Transpose cols to rows | toolshed.g2.bx.psu.edu/repos/iuc/datamash_transpose/datamash_transpose/1.8+galaxy0 |
30 | Convert to table | Convert characters1 |
31 | Compute coverage, total reads length divided by assembly length | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
32 | Convert underscores to tabs | Convert characters1 |
33 | Keep two columns | Cut1 |
34 | Round the percentage to 2 decimal places | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.0 |
35 | Label the column | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_column/1.1.3 |
36 | Join info into one table | cat1 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
Busco and dependencies version | #main/Busco and dependencies version | n/a |
|
Busco on input dataset(s): full table | #main/Busco on input dataset(s): full table | n/a |
|
Fasta Statistics on input dataset(s): summary stats | #main/Fasta Statistics on input dataset(s): summary stats | n/a |
|
Genome assembly metrics | #main/Genome assembly metrics | n/a |
|
Genome coverage | #main/Genome coverage | n/a |
|
Merqury on input dataset(s): bed | #main/Merqury on input dataset(s): bed | n/a |
|
Merqury on input dataset(s): png | #main/Merqury on input dataset(s): png | n/a |
|
Merqury on input dataset(s): qv | #main/Merqury on input dataset(s): qv | n/a |
|
Merqury on input dataset(s): size files | #main/Merqury on input dataset(s): size files | n/a |
|
Merqury on input dataset(s): stats | #main/Merqury on input dataset(s): stats | n/a |
|
Merqury on input dataset(s): wig | #main/Merqury on input dataset(s): wig | n/a |
|
Meryl on input dataset(s): read-db.meryldb | #main/Meryl on input dataset(s): read-db.meryldb | n/a |
|
Quast on input dataset(s): HTML report | #main/Quast on input dataset(s): HTML report | n/a |
|
Quast on input dataset(s): PDF report | #main/Quast on input dataset(s): PDF report | n/a |
|
Quast on input dataset(s): Log | #main/Quast on input dataset(s): Log | n/a |
|
Quast on input dataset(s): tabular report | #main/Quast on input dataset(s): tabular report | n/a |
|
_anonymous_output_1 | #main/_anonymous_output_1 | n/a |
|
_anonymous_output_10 | #main/_anonymous_output_10 | n/a |
|
_anonymous_output_11 | #main/_anonymous_output_11 | n/a |
|
_anonymous_output_12 | #main/_anonymous_output_12 | n/a |
|
_anonymous_output_13 | #main/_anonymous_output_13 | n/a |
|
_anonymous_output_14 | #main/_anonymous_output_14 | n/a |
|
_anonymous_output_15 | #main/_anonymous_output_15 | n/a |
|
_anonymous_output_16 | #main/_anonymous_output_16 | n/a |
|
_anonymous_output_17 | #main/_anonymous_output_17 | n/a |
|
_anonymous_output_18 | #main/_anonymous_output_18 | n/a |
|
_anonymous_output_19 | #main/_anonymous_output_19 | n/a |
|
_anonymous_output_2 | #main/_anonymous_output_2 | n/a |
|
_anonymous_output_20 | #main/_anonymous_output_20 | n/a |
|
_anonymous_output_21 | #main/_anonymous_output_21 | n/a |
|
_anonymous_output_22 | #main/_anonymous_output_22 | n/a |
|
_anonymous_output_23 | #main/_anonymous_output_23 | n/a |
|
_anonymous_output_24 | #main/_anonymous_output_24 | n/a |
|
_anonymous_output_25 | #main/_anonymous_output_25 | n/a |
|
_anonymous_output_26 | #main/_anonymous_output_26 | n/a |
|
_anonymous_output_3 | #main/_anonymous_output_3 | n/a |
|
_anonymous_output_4 | #main/_anonymous_output_4 | n/a |
|
_anonymous_output_5 | #main/_anonymous_output_5 | n/a |
|
_anonymous_output_6 | #main/_anonymous_output_6 | n/a |
|
_anonymous_output_7 | #main/_anonymous_output_7 | n/a |
|
_anonymous_output_8 | #main/_anonymous_output_8 | n/a |
|
_anonymous_output_9 | #main/_anonymous_output_9 | n/a |
|
out_file1 | #main/out_file1 | n/a |
|
outfile | #main/outfile | n/a |
|
Version History
v2.0.5 (latest) Created 6th Aug 2024 at 11:04 by Anna Syme
Merge pull request #7 from AustralianBioCommons/supernord-workflow-name-fix
Update workflow name in ro-crate-metadata.json
Frozen
v2.0.5
fe2213b
v2.0.2 Created 16th Apr 2024 at 08:19 by Johan Gustafsson
Update .lifemonitor.yaml
Frozen
v2.0.2
4ad99a2
v1.1.0 Created 9th May 2023 at 01:59 by Johan Gustafsson
Add missing raw data input
Frozen
v1.1.0
46d8253
v1.0.0 (earliest) Created 7th Nov 2022 at 07:10 by Johan Gustafsson
Update links
Frozen
v1.0.0
efaf002
Creators
Submitter
Views: 5368 Downloads: 648 Runs: 6
Created: 7th Nov 2022 at 07:10
Last updated: 6th Aug 2024 at 11:04
None