Publications

What is a Publication?

39 Publications visible to you, out of a total of 39

A Community Roadmap for Scientific Workflows Research and Development

FAIR Computational Workflows

(Show All)

Abstract (Expand)

Preprint: https://arxiv.org/abs/2110.02168 The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated …

Authors: Rafael Ferreira da Silva, Henri Casanova, Kyle Chard, Ilkay Altintas, Rosa M Badia, Bartosz Balis, Taina Coleman, Frederik Coppens, Frank Di Natale, Bjoern Enders, Thomas Fahringer, Rosa Filgueira, Grigori Fursin, Daniel Garijo, Carole Goble, Dorran Howell, Shantenu Jha, Daniel S. Katz, Daniel Laney, Ulf Leser, Maciej Malawski, Kshitij Mehta, Loic Pottier, Jonathan Ozik, J. Luc Peterson, Lavanya Ramakrishnan, Stian Soiland-Reyes, Douglas Thain, Matthew Wolf

Date Published: 1st Nov 2021

Publication Type: Journal

DOI: 10.1109/WORKS54523.2021.00016

Citation: 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS),pp.81-90,IEEE

Created: 25th Apr 2022 at 11:49, Last updated: 16th Jan 2023 at 13:34

Analysis of Protein-Protein Interactions networks and cross-species transfer learning comparison for seven organisms

yPublish - Bioinfo tools

Abstract (Expand)

Motivation Protein-protein interactions (PPIs) can be used for a plenty of applications like inferring protein functions or even helping the drug discovery process. For human specie, there is a lot of … validated information and functional annotations for the proteins in its interactome. In other species, the known interactome is much smaller compared with human and there are many proteins with few or no annotations by specialists. Understanding the interactome of other species helps to trace evolutionary characteristics, compare important biological processes and also build interactomes for new organisms according to other organisms more related with it instead of relying just to the human interactome. Results In this study, we evaluate the performance of PredPrIn workflow in predicting interactome for seven organisms in terms of scalability and precision showing that PredPrIn gets over than 70% of precision and it takes less than three days even on the largest datasets. We made a transfer learning analysis predicting an organism interactome from each other organism, we then showed an implication regarding to their evolutionary relation in the number of ortholog proteins shared between these organisms. We also present an analysis of functional enrichment showing the proportion of shared annotations between positive and false interactions predicted and extraction of topological features of each organism interactome such as proteins acting as hubs and bridge between modules. From each organism, one of the most frequent biological processes was selected and the proteins and pairs present in it were compared in terms of quantity in the interactome available in HINT database for that organism and the one predicted by PredPrIn. In this comparison we showed that we covered those proteins and pairs covered in HINT and also enriched these processes for almost all organisms. Conclusions In this work, we have proved the efficiency of PredPrIn workflow for protein interaction prediction for seven different organisms using scalability, performance and transfer learning analyses. We have also made cross-species interactome comparisons showing the most frequent biological processes for each organism as well as the topological features of each organism interactome showing the consistency with hypothesis about biological networks. Finally, we described the enrichment made by PredPrIn in selected biological processes showing that its prediction was important to enhance information about these organisms interactomes.

Author: Yasmmin C Martins

Date Published: 7th Jun 2023

Publication Type: Journal

DOI: 10.1101/2023.06.05.543725

Citation: biorxiv;2023.06.05.543725v1,[Preprint]

Created: 23rd Oct 2023 at 15:23, Last updated: 23rd Oct 2023 at 15:24

Automatic, Efficient and Scalable Provenance Registration for FAIR HPC Workflows

Workflows and Distributed Computing

(Show All)

Abstract (Expand)

Provenance registration is becoming more and more important, as we increase the size and number of experiments performed using computers. In particular, when provenance is recorded in HPC environments, …

Authors: Raul Sirvent, Javier Conejero, Francesc Lordan, Jorge Ejarque, Laura Rodriguez-Navas, Jose M. Fernandez, Salvador Capella-Gutierrez, Rosa M. Badia

Date Published: 1st Nov 2022

Publication Type: Proceedings

DOI: 10.1109/WORKS56498.2022.00006

Citation: 2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS),pp.1-9,IEEE

Created: 2nd Aug 2023 at 15:37, Last updated: 2nd Aug 2023 at 15:41

BioExcel Building Blocks, a software library for interoperable biomolecular simulation workflows

BioBB Building Blocks

(Show All)

Abstract (Expand)

In the recent years, the improvement of software and hardware performance has made biomolecular simulations a mature tool for the study of biological processes. Simulation length and the size and …

Authors: Pau Andrio, Adam Hospital, Javier Conejero, Luis Jordá, Marc Del Pino, Laia Codo, Stian Soiland-Reyes, Carole Goble, Daniele Lezzi, Rosa M. Badia, Modesto Orozco, Josep Ll. Gelpi

Date Published: 1st Dec 2019

Publication Type: Journal

DOI: 10.1038/s41597-019-0177-4

Citation: Sci Data 6(1),169

Created: 14th Jun 2021 at 11:27, Last updated: 16th Jan 2023 at 13:34

Collection of wing images for conservation of honey bees (Apis mellifera) biodiversity in Europe

Apis-wings

Abstract (Expand)

Identification of honey bee (Apis mellifera) from various parts of the world is essential for protection of their biodiversity. The identification can be based on wing measurements which is inexpensive …

Authors: Andrzej Oleksa, Eliza Căuia, Adrian Siceanu, Zlatko Puškadija, Marin Kovačić, M. Alice Pinto, Pedro João Rodrigues, Fani Hatjina, Leonidas Charistos, Maria Bouga, Janez Prešern, Irfan Kandemir, Slađan Rašić, Szilvia Kusza, Adam Tofilski

Date Published: 1st Oct 2022

Publication Type: Journal

DOI: 10.5281/zenodo.7244070

Citation:

Created: 28th Feb 2023 at 12:24, Last updated: 28th Feb 2023 at 14:26

Country-wide data of ecosystem structure from the third Dutch airborne laser scanning survey

Laserfarm applications to European demonstration sites

Abstract (Expand)

The third Dutch national airborne laser scanning flight campaign (AHN3, Actueel Hoogtebestand Nederland) conducted between 2014 and 2019 during the leaf-off season (October–April) across the whole … Netherlands provides a free and open-access, country-wide dataset with ∼700 billion points and a point density of ∼10(–20) points/m2. The AHN3 point cloud was obtained with Light Detection And Ranging (LiDAR) technology and contains for each point the x, y, z coordinates and additional characteristics (e.g. return number, intensity value, scan angle rank and GPS time). Moreover, the point cloud has been pre-processed by ‘Rijkswaterstraat’ (the executive agency of the Dutch Ministry of Infrastructure and Water Management), comes with a Digital Terrain Model (DTM) and a Digital Surface Model (DSM), and is delivered with a pre-classification of each point into one of six classes (0: Never Classified, 1: Unclassified, 2: Ground, 6: Building, 9: Water, 26: Reserved [bridges etc.]). However, no detailed information on vegetation structure is available from the AHN3 point cloud. We processed the AHN3 point cloud (∼16 TB uncompressed data volume) into 10 m resolution raster layers of ecosystem structure at a national extent, using a novel high-throughput workflow called ‘Laserfarm’ and a cluster of virtual machines with fast central processing units, high memory nodes and associated big data storage for managing the large amount of files. The raster layers (available as GeoTIFF files) capture 25 LiDAR metrics of vegetation structure, including ecosystem height (e.g. 95th percentiles of normalized z), ecosystem cover (e.g. pulse penetration ratio, canopy cover, and density of vegetation points within defined height layers), and ecosystem structural complexity (e.g. skewness and variability of vertical vegetation point distribution). The raster layers make use of the Dutch projected coordinate system (EPSG:28992 Amersfoort / RD New), are each ∼1 GB in size, and can be readily used by ecologists in a geographic information system (GIS) or analytical open-source software such as R and Python. Even though the class ‘1: Unclassified’ mainly includes vegetation points, other objects such as cars, fences, and boats can also be present in this class, introducing potential biases in the derived data products. We therefore validated the raster layers of ecosystem structure using >180,000 hand-labelled LiDAR points in 100 randomly selected sample plots (10 m × 10 m each) across the Netherlands. Besides vegetation, objects such as boats, fences, and cars were identified in the sampled plots. However, the misclassification rate of vegetation points (i.e. non-vegetation points that were assumed to be vegetation) was low (∼0.05) and the accuracy of the 25 LiDAR metrics derived from the AHN3 point cloud was high (∼90%). To minimize existing inaccuracies in this country-wide data product (e.g. ships on water bodies, chimneys on roofs, or cars on roads that might be incorrectly used as vegetation points), we provide an additional mask that captures water bodies, buildings and roads generated from the Dutch cadaster dataset. This newly generated country-wide ecosystem structure data product provides new opportunities for ecology and biodiversity science, e.g. for mapping the 3D vegetation structure of a variety of ecosystems or for modelling biodiversity, species distributions, abundance and ecological niches of animals and their habitats.

Authors: W. Daniel Kissling, Yifang Shi, Zsófia Koma, Christiaan Meijer, Ou Ku, Francesco Nattino, Arie C. Seijmonsbergen, Meiert W. Grootes

Date Published: 1st Feb 2023

Publication Type: Journal

DOI: 10.1016/j.dib.2022.108798

Citation: Data in Brief 46:108798

Created: 7th Feb 2025 at 08:41, Last updated: 24th Apr 2025 at 15:57

Data of vegetation structure metrics retrieved from airborne laser scanning surveys for European demonstration sites

Laserfarm applications to European demonstration sites

Abstract (Expand)

This dataset provides a standardized collection of rasterized Light Detection And Ranging (LiDAR) metrics in GeoTIFF format, derived from country-wide airborne laser scanning (ALS) data across seven … demonstration sites in five European countries: Mols Bjerge National Park (Denmark), Reserve Naturelle Nationale du Bagnas (France), Oostvaardersplassen (Netherlands), Salisbury Plain (United Kingdom), Knepp Estate (United Kingdom), Monks Wood (United Kingdom), and the island of Comino (Malta). The sites range in areal size from 0.08 km2 to 54 km2 and include habitat types such as forests, broadleaf and conifer woodlands, small plantations, dry and wet grasslands, marshes, reedbeds, arable fields, farmland, scrublands and mediterranean garigue. A total of 35 LiDAR metrics were calculated, of which 28 represent vegetation structural attributes. These include vegetation height (seven metrics), vegetation cover (fourteen metrics), and vegetation vertical variability (seven metrics). Additionally, seven metrics describe point density (one metric), eigenvalues (three metrics), and normal vectors (three metrics). The rasterized LiDAR metrics have a spatial resolution of 10 m, with coverage and extent defined by shapefiles corresponding to each demonstration site. The raw ALS point clouds were clipped to the site boundaries and processed with the 'Laserfarm' workflow, a standardized computational workflow that includes modular pipelines for re-tiling, normalization, feature extraction, and rasterization. Laserfarm employs the feature extraction module of the open-source ‘Laserchicken’ software to compute the LiDAR metrics. The workflow was implemented using the IT services of the Dutch national facility for information and communication technology, SURF. The clipped LiDAR point clouds are available through a public repository, except for the LiDAR point clouds from Comino, Malta, which are not publicly available. The 35 rasterized LiDAR metrics (GeoTIFF files, 10 m resolution) from all sites, including Comino, as well as the corresponding site boundary shapefiles (geospatial vector format), are provided in a Zenodo repository. Additionally, the Jupyter Notebooks with Python code for executing the Laserfarm workflow are available to facilitate reproducibility and further computational applications. Users should note that the rasterized LiDAR metrics may contain zero or NA values, particularly over water surfaces, with the pulse penetration ratio metric potentially indicating false high vegetation cover over water. Users may reclassify or mask areas with zero values accordingly. Some pixels exhibit abnormal vegetation height values, which can be filtered before analysis. Certain striping patterns, likely resulting from overlapping flight lines and increased point density, are present in some metrics, though their overall impact appears minimal. This dataset enables diverse applications, including canopy height measurements, mapping of hedgerows, treelines, and forest patches, as well as characterizing vegetation density, vertical stratification, and habitat openness. It supports landscape-scale habitat analysis and contributes to the standardization of vegetation metrics from ALS data for site-specific ecological monitoring (e.g., Natura 2000). Moreover, the dataset demonstrates the automated execution of LiDAR data processing workflows, which is crucial for establishing a transnational and multi-site biodiversity and ecosystem observation network.

Authors: W. Daniel Kissling, Wessel Mulder, Jinhu Wang, Yifang Shi

Date Published: 1st Jun 2025

Publication Type: Journal

DOI: 10.1016/j.dib.2025.111548

Citation: Data in Brief 60:111548

Created: 24th Apr 2025 at 15:32, Last updated: 24th Apr 2025 at 15:37

Dataset: Computer software for identification of honey bee subspecies and evolutionary lineages

Apis-wings

Abstract (Expand)

Coordinates of 19 landmarks from honey bee (Apis mellifera) worker wings. They represent 1832 workers, 187 colonies, 25 subspecies and four evolutionary lineages. The material was obtained from the …

Authors: Anna Nawrocka, Irfan Kandemir, Stefan Fuchs, Adam Tofilski

Date Published: 1st Apr 2018

Publication Type: Journal

DOI: 10.5281/zenodo.7567336

Citation:

Created: 28th Feb 2023 at 14:25, Last updated: 28th Feb 2023 at 14:27

DSCrank: A Method for Selection and Ranking of Datasets

yPublish - Bioinfo tools

Abstract (Expand)

Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local …

Authors: Yasmmin Cortes Martins, Fábio Faria da Mota, Maria Cláudia Cavalcanti

Date Published: 2016

Publication Type: Journal

DOI: 10.1007/978-3-319-49157-8_29

Citation: Metadata and Semantics Research 672:333-344,Springer International Publishing

Created: 23rd Oct 2023 at 14:59, Last updated: 23rd Oct 2023 at 15:04

EpiCurator: an immunoinformatic workflow to predict and prioritize SARS-CoV-2 epitopes

yPublish - Bioinfo tools

Abstract (Expand)

The ongoing coronavirus 2019 (COVID-19) pandemic, triggered by the emerging SARS-CoV-2 virus, represents a global public health challenge. Therefore, the development of effective vaccines is an urgent …

Authors: Cristina S. Ferreira, Yasmmin C. Martins, Rangel Celso Souza, Ana Tereza R. Vasconcelos

Date Published: 2021

Publication Type: Journal

DOI: 10.7717/peerj.12548

Citation: PeerJ 9:e12548

Created: 23rd Oct 2023 at 15:04, Last updated: 23rd Oct 2023 at 15:06

Publications

Filters ×

Filters