This is part of a series of workflows to annotate a genome, tagged with TSI-annotation
.
These workflows are based on command-line code by Luke Silver, converted into Galaxy Australia workflows.
The workflows can be run in this order:
- Repeat masking
- RNAseq QC and read trimming
- Find transcripts
- Combine transcripts
- Extract transcripts
- Convert formats
- Fgenesh annotation
Inputs required: assembled-genome.fasta, hard-repeat-masked-genome.fasta, and (because this workflow maps known mRNA sequences) .cdna, .pro and .dat files. It is also required to select certain databases for Fgenesh-annotate and for Busco.
This workflow splits the input genomes into single sequences (to decrease computation time), annotates using FgenesH++, and merges the output.
Outputs: genome annotation in gff3 format, genome annotation stats, fasta files of mRNAs, cDNAs and proteins, Busco report of proteins.
Note: The input sequences to the tools to extract mRNA and cDNA here are the assembly.fasta sequences (unmasked) but there may be a reason to prefer the masked version, we are unsure of when that may be the case.
Note: If you want to use this workflow without an input of known mRNAs, you can save a copy of the workflow and edit the "Fgenesh annotate" tool with "no" at this option, you will then not need an input of .cdna .pro and .dat files.
Changes made 13 Nov 2024: Added correct input files and connected them to the split steps. Added inputs for db selections in the annotation step. Added lineage input to Busco. Added genome annotation stats derived from gff3 output. Connected in assembly.fasta sequences to "get mRNA/cDNA" tools. Expanded this information text and clarified the need for .cdna .pro and .dat files as input.
Inputs
ID | Name | Description | Type |
---|---|---|---|
Select an approximately closely-related species | Select an approximately closely-related species | n/a |
|
Select lineage | Select lineage | n/a |
|
Select mammal or non-mammal | Select mammal or non-mammal | n/a |
|
assembled_genome.fasta | assembled_genome.fasta | n/a |
|
hard_masked_genome.fasta | hard_masked_genome.fasta | n/a |
|
Steps
ID | Name | Description |
---|---|---|
5 | FGENESH split | fgenesh_split |
6 | FGENESH split | fgenesh_split |
7 | FGENESH annotate | fgenesh_annotate |
8 | FGENESH merge | fgenesh_merge |
9 | Merge into a single annotation file | fgenesh_merge |
10 | get mRNA sequences | fgenesh_get_mrnas_gc |
11 | FGENESH get protein | fgenesh_get_proteins |
12 | get cDNA sequences | fgenesh_get_mrnas_gc |
13 | Genome annotation statistics | toolshed.g2.bx.psu.edu/repos/iuc/jcvi_gff_stats/jcvi_gff_stats/0.8.4 |
14 | Busco | toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
output_gff | output_gff | n/a |
|
output_mrna_file | output_mrna_file | n/a |
|
output_prot_file | output_prot_file | n/a |
|
output_cdna_file | output_cdna_file | n/a |
|
Version History
Version 3 (latest) Created 17th Nov 2024 at 23:41 by Anna Syme
Changes made 13 Nov 2024: Added correct input files and connected them to the split steps. Added inputs for db selections in the annotation step. Added lineage input to Busco. Added genome annotation stats derived from gff3 output. Connected in assembly.fasta sequences to "get mRNA/cDNA" tools. Expanded this information text and clarified the need for .cdna .pro and .dat files as input.
Frozen
Version-3
fe9c08f
Version 2.1 Created 18th Jun 2024 at 10:08 by Anna Syme
add updated workflow image
Frozen
Version-2.1
3cebff9
Version 1 (earliest) Created 8th May 2024 at 08:28 by Anna Syme
Initial commit
Frozen
Version-1
1b30a7e
Creator
Submitter
Views: 2902 Downloads: 568 Runs: 6
Created: 8th May 2024 at 08:28
Last updated: 17th Nov 2024 at 23:51
None