preparing genomic data for phylogeny recostruction (GTN)
Version 1

Workflow Type: Galaxy

This workflow begins from a set of genome assemblies of different samples, strains, species. The genome is first annotated with Funnanotate. Predicted proteins are furtner annotated with Busco. Next, 'ProteinOrtho' finds orthologs across the samples and makes orthogroups. Orthogroups where all samples are represented are extracted. Orthologs in each orthogroup are aligned with ClustalW. Test dataset: https://zenodo.org/record/6610704#.Ypn3FzlBw5k

SEEK ID: https://workflowhub.eu/workflows/358?version=1

Inputs

ID	Name	Description	Type
Input genomes as collection	Input genomes as collection	n/a	File[]

Steps

ID	Name	Description
1	Replace Text	toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/1.1.2
2	RepeatMasker	toolshed.g2.bx.psu.edu/repos/bgruening/repeat_masker/repeatmasker_wrapper/4.1.2-p1+galaxy0
3	Funannotate predict annotation	toolshed.g2.bx.psu.edu/repos/iuc/funannotate_predict/funannotate_predict/1.8.9+galaxy2
4	Extract ORF	toolshed.g2.bx.psu.edu/repos/bgruening/glimmer_gbk_to_orf/glimmer_gbk_to_orf/3.02
5	Regex Find And Replace	toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1
6	Collapse Collection	toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2
7	Proteinortho	toolshed.g2.bx.psu.edu/repos/iuc/proteinortho/proteinortho/6.0.14+galaxy2.9.1
8	Busco	toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/4.1.4
9	Filter	Filter1
10	Proteinortho grab proteins	toolshed.g2.bx.psu.edu/repos/iuc/proteinortho_grab_proteins/proteinortho_grab_proteins/6.0.14+galaxy2.9.1
11	Regex Find And Replace	toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.1
12	ClustalW	toolshed.g2.bx.psu.edu/repos/devteam/clustalw/clustalw/2.1

Outputs

ID	Name	Description	Type
headers_shortened	headers_shortened	n/a	File
repeat_masked	repeat_masked	n/a	File
funannotate_predicted_proteins	funannotate_predicted_proteins	n/a	File
extracted_ORFs	extracted_ORFs	n/a	File
_anonymous_output_1	_anonymous_output_1	n/a	File
sample_names_to_headers	sample_names_to_headers	n/a	File
proteomes_to_one_file	proteomes_to_one_file	n/a	File
_anonymous_output_2	_anonymous_output_2	n/a	File
Proteinortho on input dataset(s): orthology-groups	Proteinortho on input dataset(s): orthology-groups	n/a	File
_anonymous_output_3	_anonymous_output_3	n/a	File
_anonymous_output_4	_anonymous_output_4	n/a	File
_anonymous_output_5	_anonymous_output_5	n/a	File
_anonymous_output_6	_anonymous_output_6	n/a	File
_anonymous_output_7	_anonymous_output_7	n/a	File
Proteinortho_extract_by_orthogroup	Proteinortho_extract_by_orthogroup	n/a	File
fasta_header_cleaned	fasta_header_cleaned	n/a	File
ClustalW on input dataset(s): clustal	ClustalW on input dataset(s): clustal	n/a	File