Fgenesh Annotation - TSI Workflow Description
Overview
One of a series of workflows to annotate a genome, tagged TSI-annotation. Based on command-line code by Luke Silver, converted into Galaxy Australia workflows.
Workflow Sequence
Run in this order:
- Repeat masking
- RNAseq QC and read trimming
- Find transcripts
- Combine transcripts
- Extract transcripts
- Convert formats
- Fgenesh annotation (this workflow)
Inputs Required
Files uploaded by the user:
assembled_genome.fasta— the assembled genomehard_masked_genome.fasta— hard repeat-masked genomemRNA_sequences.fasta— known mRNAs in Fgenesh header format (typically the output of the upstream "Convert formats" workflow). Optional — see "Running without mRNA" below.
Selected at runtime (dropdowns / tick-boxes, not uploads):
- Closely-related species (Fgenesh species matrix, from those installed on Galaxy Australia)
- Mammal or non-mammal
- NR database (for Fgenesh get proteins)
- BUSCO lineage
- Licence agreement (tick to accept Fgenesh terms)
Running without mRNA
If no known mRNA sequences are available, edit the Fgenesh-annotate step's mRNA option to "no" and disconnect the mRNA input.
Processing Steps
Splits the input genomes into single sequences (to reduce runtime), annotates each with Fgenesh++, and merges the outputs.
Outputs
- Genome annotation (GFF3)
- Annotation stats
- FASTA files of mRNAs, cDNAs and proteins
- BUSCO report of proteins
Key Note
The sequences passed to the mRNA/cDNA extraction tools are the unmasked assembly; there may be situations where the masked version is preferable.
Inputs
| ID | Name | Description | Type |
|---|---|---|---|
| Select an approximately closely-related species | Select an approximately closely-related species | n/a |
|
| Select lineage | Select lineage | n/a |
|
| Select mammal or non-mammal | Select mammal or non-mammal | n/a |
|
| assembled_genome.fasta | assembled_genome.fasta | n/a |
|
| hard_masked_genome.fasta | hard_masked_genome.fasta | n/a |
|
Steps
| ID | Name | Description |
|---|---|---|
| 5 | FGENESH split | fgenesh_split |
| 6 | FGENESH split | fgenesh_split |
| 7 | FGENESH annotate | fgenesh_annotate |
| 8 | FGENESH merge | fgenesh_merge |
| 9 | Merge into a single annotation file | fgenesh_merge |
| 10 | get mRNA sequences | fgenesh_get_mrnas_gc |
| 11 | FGENESH get protein | fgenesh_get_proteins |
| 12 | get cDNA sequences | fgenesh_get_mrnas_gc |
| 13 | Genome annotation statistics | toolshed.g2.bx.psu.edu/repos/iuc/jcvi_gff_stats/jcvi_gff_stats/0.8.4 |
| 14 | Busco | toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0 |
Outputs
| ID | Name | Description | Type |
|---|---|---|---|
| output_gff | output_gff | n/a |
|
| output_mrna_file | output_mrna_file | n/a |
|
| output_prot_file | output_prot_file | n/a |
|
| output_cdna_file | output_cdna_file | n/a |
|
Version History
Version 3.2 (latest) Created 18th Apr 2026 at 11:50 by Anna Syme
Shortened the licence-agreement parameter input label from the full BioCommons acknowledgement text to "Licence agreement (tick to accept Fgenesh terms)" to tidy the workflow diagram. Updated the workflow description to accurately list inputs (assembled_genome.fasta, hard_masked_genome.fasta, mRNA_sequences.fasta) and clarify that species matrix, NR database and BUSCO lineage are dropdown selections rather than uploaded files. No functional changes; .ga steps and tool versions unchanged from Version 3.1.
Frozen
Version-3.2
af61f4b
Version 3.1 Created 18th Apr 2026 at 11:21 by Anna Syme
Fixed: added license_agreements input, mRNA_sequences.fasta input, updated fgenesh_annotate/fgenesh_get_proteins to v2024.2+galaxy1. Tested on S. cerevisiae (BUSCO C:84.0%).
Frozen
Version-3.1
fe9c08f
Version 3 Created 17th Nov 2024 at 23:41 by Anna Syme
Changes made 13 Nov 2024: Added correct input files and connected them to the split steps. Added inputs for db selections in the annotation step. Added lineage input to Busco. Added genome annotation stats derived from gff3 output. Connected in assembly.fasta sequences to "get mRNA/cDNA" tools. Expanded this information text and clarified the need for .cdna .pro and .dat files as input.
Frozen
Version-3
fe9c08f
Version 2.1 Created 18th Jun 2024 at 10:08 by Anna Syme
add updated workflow image
Frozen
Version-2.1
3cebff9
Version 1 (earliest) Created 8th May 2024 at 08:28 by Anna Syme
Initial commit
Frozen
Version-1
1b30a7e
Creators and SubmitterCreator
Submitter
Views: 9327 Downloads: 2485 Runs: 13
Created: 8th May 2024 at 08:28
Last updated: 18th Apr 2026 at 11:56
Tags
AttributionsNone
Collections
Run on Galaxy
https://orcid.org/0000-0002-9906-0673