# Fgenesh Annotation - TSI Workflow Description ## Overview One of a series of workflows to annotate a genome, tagged `TSI-annotation`. Based on command-line code by Luke Silver, converted into Galaxy Australia workflows. ## Workflow Sequence Run in this order: - Repeat masking - RNAseq QC and read trimming - Find transcripts - Combine transcripts - Extract transcripts - **Fgenesh annotation** (this workflow) ## Inputs Required **Files uploaded by the user:** - `assembled_genome.fasta` — the assembled genome - `hard_masked_genome.fasta` — hard repeat-masked genome - `mRNA_sequences.fasta` — TransDecoder CDS output from the upstream "Extract transcripts" workflow (use the "Results (CDS/FASTA)" output) **Selected at runtime (dropdowns / tick-boxes, not uploads):** - Closely-related species (Fgenesh species matrix, from those installed on Galaxy Australia) - Mammal or non-mammal - NR database (for Fgenesh get proteins) - BUSCO lineage - Licence agreement (tick to accept Fgenesh terms) ## Running without mRNA If no known mRNA sequences are available, edit the Fgenesh-annotate step's mRNA option to "no" and disconnect the mRNA input. ## Processing Steps Splits the input genomes into single sequences (to reduce runtime), annotates each with Fgenesh++, and merges the outputs. TransDecoder CDS sequences are automatically filtered and reformatted for Fgenesh before annotation. ## Outputs - Genome annotation (GFF3) - Annotation stats - FASTA files of mRNAs, cDNAs and proteins - BUSCO report of proteins ## Key Note The sequences passed to the mRNA/cDNA extraction tools are the unmasked assembly; there may be situations where the masked version is preferable.