Combine transcripts - TSI
Version 1

Workflow Type: Galaxy

This is part of a series of workflows to annotate a genome, tagged with TSI-annotation. These workflows are based on command-line code by Luke Silver, converted into Galaxy Australia workflows.

The workflows can be run in this order:

  • Repeat masking
  • RNAseq QC and read trimming
  • Find transcripts
  • Combine transcripts
  • Extract transcripts
  • Convert formats
  • Fgenesh annotation

About this workflow:

  • Inputs: multiple transcriptome.gtfs from different tissues, genome.fasta, coding_seqs.fasta, non_coding_seqs.fasta
  • Runs StringTie merge to combine transcriptomes, with default settings except for -m = 30 and -F = 0.1, to produce a merged_transcriptomes.gtf.
  • Runs Convert GTF to BED12 with default settings, to produce a merged_transcriptomes.bed.
  • Runs bedtools getfasta with default settings except for -name = yes, -s = yes, -split - yes, to produce a merged_transcriptomes.fasta
  • Runs CPAT to generate seqs with high coding probability.
  • Filters out non-coding seqs from the merged_transcriptomes.fasta
  • Output: filtered_merged_transcriptomes.fasta


ID Name Description Type
Collection of transcriptome.gtf files #main/Collection of transcriptome.gtf files n/a
  • array containing
    • File
coding_seqs.fasta #main/coding_seqs.fasta n/a
  • File
genome.fasta #main/genome.fasta n/a
  • File
non_coding_seqs.fasta #main/non_coding_seqs.fasta n/a
  • File


ID Name Description
4 StringTie merge
5 Convert GTF to BED12
6 bedtools getfasta
7 CPAT (check settings) The table of best probabilities is called orf_seqs_prob_best; converted this to tabular
8 Filter and keep only seqs with >0.5 coding prob skipping 1 header line Filter1
9 Keep only column 1 - read headers
10 Fix headers to overwrite some uppercase part of the headers have become capitalized, this reverts everything after the :: to lowercase. May need to be changed if headers don't have the same format with a :: in them.
11 Filter out non-coding seqs (check output)


ID Name Description Type
bed_file #main/bed_file n/a
  • File
no_orf_seqs #main/no_orf_seqs n/a
  • File
orf_seqs #main/orf_seqs n/a
  • File
orf_seqs_prob #main/orf_seqs_prob n/a
  • File
orf_seqs_prob_best #main/orf_seqs_prob_best n/a
  • File
out_file1 #main/out_file1 n/a
  • File
out_gtf #main/out_gtf n/a
  • File
output #main/output n/a
  • File
output_pos #main/output_pos n/a
  • File

Version History

Version 1 (earliest) Created 8th May 2024 at 08:07 by Anna Syme

Initial commit

Frozen Version-1 ff43cfe
help Creators and Submitter
Silver, L., & Syme, A. (2024). Combine transcripts - TSI. WorkflowHub.

Created: 8th May 2024 at 08:07

Last updated: 9th May 2024 at 05:06

help Attributions


