Framework para execução de workflows de redes filogenéticas em ambientes de computação de alto desempenho

Abstract:

In the last years, the development of technologies, such as next-generation sequencing and high-performance computing allowed the execution of Bioinformatics experiments of high complexity and computationally intensives. Different Bioinformatics fields need to use high-performance computing platforms to take advantage of the parallelism and tasks distribution, through specialized technologies of scientific workflows management systems. One of the Bioinformatics fields that need high-performance computing is phylogeny, a field that expresses the evolutive relations between genes and organisms, establishing which of them are most related evolutively. The phylogeny is used in several approaches, such as in the species classification; in the discovery of individuals’ kinship; in the identification of pathogens origins, and even in conservation biology. A way of representing these phylogenetic relations is using phylogenetic networks. However, the construction of these networks uses computationally intensive algorithms that require the constant manipulation of different input data. This work aims the development of a framework for construction of explicit phylogenetic networks, modeling a scientific workflow that adds different methods for the construction of the networks and the required input data treatment. The framework was developed to allow the use of multiple flows from the workflow in an automated, parallel, and distributed manner in a single execution and also to be executable in high- performance computing environments, constituting a challenging task, once the tools used are not developed focused in this environment. To orchestrate the workflow tasks, the scalable parallel programing library Parsl was used, allowing to do optimizations in the workflow’s tasks execution, performing better management of the resources. Two versions of the framework were developed, called Single Partition and Multi Partition, differing in the manner in which the resources are used. In tests performed, there was an improvement in the execution time of about five times when compared to the sequential execution of a flow without the optimizations. The framework was validated using public data of Dengue virus genomes, which were processed, annotated, and executed in the framework using the Santos Dumont supercomputer. The construction of the genomes’ explicit phylogenetic networks indicates that the framework is a functional, efficient, and easy to use tool.

SEEK ID: https://workflowhub.eu/publications/33

Teams: HP2NET - Framework for construction of phylogenetic networks on High Per...

Publication type: Master's Thesis

Publisher: Laboratório Nacional de Computação Científica

Citation: TERRA, R. S. Framework para execução de workflows de redes filogenéticas em ambientes de computação de alto desempenho. 2022. 71 f. Tese. (Programa de Pós-Graduação em Modelagem Computacional) - Laboratório Nacional de Computação Científica, Petrópolis, 2022.

Date Published: 18th Feb 2022

URL: https://tede.lncc.br/handle/tede/351

Registered Mode: manually

Authors: Rafael Terra, Kary Ocaña, Carla Osthoff, Diego Carvalho

help Submitter
Activity

Views: 323

Created: 9th Jan 2024 at 13:16

help Tags

This item has not yet been tagged.

help Attributions

None

Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH