Publications

5 Publications visible to you, out of a total of 5

Abstract (Expand)

In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and complexity of the analyzed systems make simulations both complementary and compatible with other bioinformatics disciplines. However, the characteristics of the software packages used for simulation have prevented the adoption of the technologies accepted in other bioinformatics fields like automated deployment systems, workflow orchestration, or the use of software containers. We present here a comprehensive exercise to bring biomolecular simulations to the “bioinformatics way of working”. The exercise has led to the development of the BioExcel Building Blocks (BioBB) library. BioBB’s are built as Python wrappers to provide an interoperable architecture. BioBB’s have been integrated in a chain of usual software management tools to generate data ontologies, documentation, installation packages, software containers and ways of integration with workflow managers, that make them usable in most computational environments.

Authors: Pau Andrio, Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Stian Soiland-Reyes, Carole Goble, Daniele Lezzi, Rosa M. Badia, Modesto Orozco, Josep Ll. Gelpi

Date Published: 1st Dec 2019

Publication Type: Journal

Abstract (Expand)

Background A new era of flu surveillance has already started based on the genetic characterization and exploration of influenza virus evolution at whole-genome scale. Although this has been prioritizedd by national and international health authorities, the demanded technological transition to whole-genome sequencing (WGS)-based flu surveillance has been particularly delayed by the lack of bioinformatics infrastructures and/or expertise to deal with primary next-generation sequencing (NGS) data. Results We developed and implemented INSaFLU (“INSide the FLU”), which is the first influenza-oriented bioinformatics free web-based suite that deals with primary NGS data (reads) towards the automatic generation of the output data that are actually the core first-line “genetic requests” for effective and timely influenza laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants’ annotation, alignments and phylogenetic trees). By handling NGS data collected from any amplicon-based schema, the implemented pipeline enables any laboratory to perform multi-step software intensive analyses in a user-friendly manner without previous advanced training in bioinformatics. INSaFLU gives access to user-restricted sample databases and projects management, being a transparent and flexible tool specifically designed to automatically update project outputs as more samples are uploaded. Data integration is thus cumulative and scalable, fitting the need for a continuous epidemiological surveillance during the flu epidemics. Multiple outputs are provided in nomenclature-stable and standardized formats that can be explored in situ or through multiple compatible downstream applications for fine-tuned data analysis. This platform additionally flags samples as “putative mixed infections” if the population admixture enrolls influenza viruses with clearly distinct genetic backgrounds, and enriches the traditional “consensus-based” influenza genetic characterization with relevant data on influenza sub-population diversification through a depth analysis of intra-patient minor variants. This dual approach is expected to strengthen our ability not only to detect the emergence of antigenic and drug resistance variants but also to decode alternative pathways of influenza evolution and to unveil intricate routes of transmission. Conclusions In summary, INSaFLU supplies public health laboratories and influenza researchers with an open “one size fits all” framework, potentiating the operationalization of a harmonized multi-country WGS-based surveillance for influenza virus.

Authors: Vítor Borges, Miguel Pinheiro, Pedro Pechirra, Raquel Guiomar, João Paulo Gomes

Date Published: 1st Dec 2018

Publication Type: InProceedings

Abstract (Expand)

We here introduce the concept of Canonical Workflow Building Blocks (CWBB), a methodology of describing and wrapping computational tools, in order for them to be utilized in a reproducible manner from multiple workflow languages and execution platforms. We argue such practice is a necessary requirement for FAIR Computational Workflows [Goble 2020] to improve widespread adoption and reuse of a computational method across workflow language barriers.

Authors: Stian Soiland-Reyes, Genís Bayarri, Pau Andrio, Robin Long, Douglas Lowe, Ania Niewielska, Adam Hospital

Date Published: 1st Mar 2021

Publication Type: Misc

Abstract (Expand)

A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard. Preprint, submitted to Communications of the ACM (CACM).

Authors: Michael R. Crusoe, Sanne Abeln, Alexandru Iosup, Peter Amstutz, John Chilton, Nebojša Tijanić, Hervé Ménager, Stian Soiland-Reyes, Carole Goble

Date Published: 14th May 2021

Publication Type: Unpublished

Abstract (Expand)

BACKGROUND: Oxford Nanopore Technology (ONT) long-read sequencing has become a popular platform for microbial researchers due to the accessibility and affordability of its devices. However, easy and automated construction of high-quality bacterial genomes using nanopore reads remains challenging. Here we aimed to create a reproducible end-to-end bacterial genome assembly pipeline using ONT in combination with Illumina sequencing. RESULTS: We evaluated the performance of several popular tools used during genome reconstruction, including base-calling, filtering, assembly, and polishing. We also assessed overall genome accuracy using ONT both natively and with Illumina. All steps were validated using the high-quality complete reference genome for the Escherichia coli sequence type (ST)131 strain EC958. Software chosen at each stage were incorporated into our final pipeline, MicroPIPE. Further validation of MicroPIPE was carried out using 11 additional ST131 E. coli isolates, which demonstrated that complete circularised chromosomes and plasmids could be achieved without manual intervention. Twelve publicly available Gram-negative and Gram-positive bacterial genomes (with available raw ONT data and matched complete genomes) were also assembled using MicroPIPE. We found that revised basecalling and updated assembly of the majority of these genomes resulted in improved accuracy compared to the current publicly available complete genomes. CONCLUSIONS: MicroPIPE is built in modules using Singularity container images and the bioinformatics workflow manager Nextflow, allowing changes and adjustments to be made in response to future tool development. Overall, MicroPIPE provides an easy-access, end-to-end solution for attaining high-quality bacterial genomes. MicroPIPE is available at https://github.com/BeatsonLab-MicrobialGenomics/micropipe .

Authors: V. Murigneux, L. W. Roberts, B. M. Forde, M. D. Phan, N. T. K. Nhu, A. D. Irwin, P. N. A. Harris, D. L. Paterson, M. A. Schembri, D. M. Whiley, S. A. Beatson

Date Published: 25th Jun 2021

Publication Type: Journal

Powered by
(v.1.12.0-master)
Copyright © 2008 - 2021 The University of Manchester and HITS gGmbH