Assess genome quality
Version 1

Workflow Type: Galaxy

Assess genome quality; can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Assesses the quality of the genome assembly: generate some statistics and determine if expected genes are present; align contigs to a reference genome.
  • Inputs: polished assembly; reference_genome.fasta (e.g. of a closely-related species, if available).
  • Outputs: Busco table of genes found; Quast HTML report, and link to Icarus contigs browser, showing contigs aligned to a reference genome
  • Tools used: Busco, Quast
  • Input parameters: None required

Workflow steps:

Polished assembly => Busco

  • First: predict genes in the assembly: using Metaeuk
  • Second: compare the set of predicted genes to the set of expected genes in a particular lineage. Default setting for lineage: Eukaryota

Polished assembly and a reference genome => Quast

  • Contigs/scaffolds file: polished assembly
  • Type of assembly: Genome
  • Use a reference genome: Yes
  • Reference genome: Arabidopsis genome
  • Is the genome large (> 100Mbp)? Yes.
  • All other settings as defaults, except second last setting: Distinguish contigs with more than 50% unaligned bases as a separate group of contigs?: change to No

Options

Gene prediction:

  • Change tool used by Busco to predict genes in the assembly: instead of Metaeuk, use Augustus.
  • To do this: select: Use Augustus; Use another predefined species model; then choose from the drop down list.
  • Select from a database of trained species models. list here: https://github.com/Gaius-Augustus/Augustus/tree/master/config/species
  • Note: if using Augustus: it may fail if the input assembly is too small (e.g. a test-size data assembly). It can't do the training part properly.

Compare genes found to other lineage:

  • Busco has databases of lineages and their expected genes. Option to change lineage.
  • Not all lineages are available - there is a mix of broader and narrower lineages. - list of lineages here: https://busco.ezlab.org/list_of_lineages.html.
  • To see the groups in taxonomic hierarchies: Eukaryotes: https://busco.ezlab.org/frames/euka.htm
  • For example, if you have a plant species from Fabales, you could set that as the lineage.
  • The narrower the taxonomic group, the more total genes are expected.

Infrastructure_deployment_metadata: Galaxy Australia (Galaxy)

Inputs

ID Name Description Type
Polished assembly Polished assembly n/a n/a
Reference genome Reference genome n/a n/a

Steps

ID Name Description
0 Polished assembly
1 Reference genome
2 Busco: assess assembly toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy0
3 Quast: assess assembly toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy1

Outputs

ID Name Description Type
busco_sum busco_sum n/a
busco_table busco_table n/a
busco_missing busco_missing n/a
summary_image summary_image n/a
quast_tabular quast_tabular n/a
report_html report_html n/a
report_pdf report_pdf n/a
log log n/a
mis_ass mis_ass n/a
unalign unalign n/a

Version History

Version 1 (earliest) Created 8th Nov 2021 at 06:03 by Anna Syme

Added/updated 2 files


Open master a760082
help Creators and Submitter
Creator
Submitter
Citation
Syme, A. (2021). Assess genome quality. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.229.1
Activity

Views: 1179   Downloads: 4

Created: 8th Nov 2021 at 06:03

Last updated: 9th Nov 2021 at 01:12

Last used: 2nd Dec 2022 at 11:29

EDAM Properties
help Attributions

None

Total size: 166 KB
Powered by
(v.1.12.3)
Copyright © 2008 - 2022 The University of Manchester and HITS gGmbH

By continuing to use this site you agree to the use of cookies