Workflow Type: Common Workflow Language
Stable

Protein Conformational ensembles generation

Workflow included in the ELIXIR 3D-Bioinfo Implementation Study:

Building on PDBe-KB to chart and characterize the conformation landscape of native proteins

This tutorial aims to illustrate the process of generating protein conformational ensembles from** 3D structures **and analysing its molecular flexibility, step by step, using the BioExcel Building Blocks library (biobb).

Conformational landscape of native proteins

Proteins are dynamic systems that adopt multiple conformational states, a property essential for many biological processes (e.g. binding other proteins, nucleic acids, small molecule ligands, or switching between functionaly active and inactive states). Characterizing the different conformational states of proteins and the transitions between them is therefore critical for gaining insight into their biological function and can help explain the effects of genetic variants in health and disease and the action of drugs.

Structural biology has become increasingly efficient in sampling the different conformational states of proteins. The PDB has currently archived more than 170,000 individual structures, but over two thirds of these structures represent multiple conformations of the same or related protein, observed in different crystal forms, when interacting with other proteins or other macromolecules, or upon binding small molecule ligands. Charting this conformational diversity across the PDB can therefore be employed to build a useful approximation of the conformational landscape of native proteins.

A number of resources and tools describing and characterizing various often complementary aspects of protein conformational diversity in known structures have been developed, notably by groups in Europe. These tools include algorithms with varying degree of sophistication, for aligning the 3D structures of individual protein chains or domains, of protein assemblies, and evaluating their degree of structural similarity. Using such tools one can align structures pairwise, compute the corresponding similarity matrix, and identify ensembles of structures/conformations with a defined similarity level that tend to recur in different PDB entries, an operation typically performed using clustering methods. Such workflows are at the basis of resources such as CATH, Contemplate, or PDBflex that offer access to conformational ensembles comprised of similar conformations clustered according to various criteria. Other types of tools focus on differences between protein conformations, identifying regions of proteins that undergo large collective displacements in different PDB entries, those that act as hinges or linkers, or regions that are inherently flexible.

To build a meaningful approximation of the conformational landscape of native proteins, the conformational ensembles (and the differences between them), identified on the basis of structural similarity/dissimilarity measures alone, need to be biophysically characterized. This may be approached at two different levels.

  • At the biological level, it is important to link observed conformational ensembles, to their functional roles by evaluating the correspondence with protein family classifications based on sequence information and functional annotations in public databases e.g. Uniprot, PDKe-Knowledge Base (KB). These links should provide valuable mechanistic insights into how the conformational and dynamic properties of proteins are exploited by evolution to regulate their biological function.

  • At the physical level one needs to introduce energetic consideration to evaluate the likelihood that the identified conformational ensembles represent conformational states that the protein (or domain under study) samples in isolation. Such evaluation is notoriously challenging and can only be roughly approximated by using computational methods to evaluate the extent to which the observed conformational ensembles can be reproduced by algorithms that simulate the dynamic behavior of protein systems. These algorithms include the computationally expensive classical molecular dynamics (MD) simulations to sample local thermal fluctuations but also faster more approximate methods such as Elastic Network Models and Normal Node Analysis (NMA) to model low energy collective motions. Alternatively, enhanced sampling molecular dynamics can be used to model complex types of conformational changes but at a very high computational cost.

The ELIXIR 3D-Bioinfo Implementation Study Building on PDBe-KB to chart and characterize the conformation landscape of native proteins focuses on:

  1. Mapping the conformational diversity of proteins and their homologs across the PDB.
  2. Characterize the different flexibility properties of protein regions, and link this information to sequence and functional annotation.
  3. Benchmark computational methods that can predict a biophysical description of protein motions.

This notebook is part of the third objective, where a list of computational resources that are able to predict protein flexibility and conformational ensembles have been collected, evaluated, and integrated in reproducible and interoperable workflows using the BioExcel Building Blocks library. Note that the list is not meant to be exhaustive, it is built following the expertise of the implementation study partners.


Copyright & Licensing

This software has been developed in the MMB group at the BSC & IRB for the European BioExcel, funded by the European Commission (EU H2020 823830, EU H2020 675728).

Licensed under the Apache License 2.0, see the file LICENSE for details.

Inputs

ID Name Description Type
step0_extract_model_input_structure_path Input file Input structure file path.
  • File
step0_extract_model_output_structure_path Output file Output structure file path.
  • string
step0_extract_model_config Config file Configuration file for biobb_structure_utils.extract_model tool.
  • string
step1_extract_chain_output_structure_path Output file Output structure file path.
  • string
step1_extract_chain_config Config file Configuration file for biobb_structure_utils.extract_chain tool.
  • string
step2_cpptraj_mask_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step2_cpptraj_mask_config Config file Configuration file for biobb_analysis.cpptraj_mask tool.
  • string
step3_cpptraj_mask_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step3_cpptraj_mask_config Config file Configuration file for biobb_analysis.cpptraj_mask tool.
  • string
step4_concoord_dist_output_pdb_path Output file Output pdb file.
  • string
step4_concoord_dist_output_gro_path Output file Output gro file.
  • string
step4_concoord_dist_output_dat_path Output file Output dat with structure interpretation and bond definitions.
  • string
step4_concoord_dist_config Config file Configuration file for biobb_flexdyn.concoord_dist tool.
  • string
step5_concoord_disco_output_traj_path Output file Output trajectory file.
  • string
step5_concoord_disco_output_rmsd_path Output file Output rmsd file.
  • string
step5_concoord_disco_output_bfactor_path Output file Output B-factor file.
  • string
step5_concoord_disco_config Config file Configuration file for biobb_flexdyn.concoord_disco tool.
  • string
step6_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step6_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step7_cpptraj_convert_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step7_cpptraj_convert_config Config file Configuration file for biobb_analysis.cpptraj_convert tool.
  • string
step8_prody_anm_output_pdb_path Output file Output multi-model PDB file with the generated ensemble.
  • string
step8_prody_anm_config Config file Configuration file for biobb_flexdyn.prody_anm tool.
  • string
step9_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step9_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step10_cpptraj_convert_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step10_cpptraj_convert_config Config file Configuration file for biobb_analysis.cpptraj_convert tool.
  • string
step11_bd_run_output_crd_path Output file Output ensemble.
  • string
step11_bd_run_output_log_path Output file Output log file.
  • string
step11_bd_run_config Config file Configuration file for biobb_flexserv.bd_run tool.
  • string
step12_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step12_cpptraj_rms_output_traj_path Output file Path to the output processed trajectory.
  • string
step12_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step13_dmd_run_output_crd_path Output file Output ensemble.
  • string
step13_dmd_run_output_log_path Output file Output log file.
  • string
step13_dmd_run_config Config file Configuration file for biobb_flexserv.dmd_run tool.
  • string
step14_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step14_cpptraj_rms_output_traj_path Output file Path to the output processed trajectory.
  • string
step14_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step15_nma_run_output_crd_path Output file Output ensemble.
  • string
step15_nma_run_output_log_path Output file Output log file.
  • string
step15_nma_run_config Config file Configuration file for biobb_flexserv.nma_run tool.
  • string
step16_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step16_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step17_cpptraj_convert_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step17_cpptraj_convert_config Config file Configuration file for biobb_analysis.cpptraj_convert tool.
  • string
step18_nolb_nma_output_pdb_path Output file Output multi-model PDB file with the generated ensemble.
  • string
step18_nolb_nma_config Config file Configuration file for biobb_flexdyn.nolb_nma tool.
  • string
step19_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step19_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step20_cpptraj_convert_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step20_cpptraj_convert_config Config file Configuration file for biobb_analysis.cpptraj_convert tool.
  • string
step21_imod_imode_output_dat_path Output file Output dat with normal modes.
  • string
step21_imod_imode_config Config file Configuration file for biobb_flexdyn.imod_imode tool.
  • string
step22_imod_imc_output_traj_path Output file Output multi-model PDB file with the generated ensemble.
  • string
step22_imod_imc_config Config file Configuration file for biobb_flexdyn.imod_imc tool.
  • string
step23_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step23_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step24_cpptraj_convert_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step24_cpptraj_convert_config Config file Configuration file for biobb_analysis.cpptraj_convert tool.
  • string
step26_make_ndx_output_ndx_path Output file Path to the output index NDX file.
  • string
step26_make_ndx_config Config file Configuration file for biobb_gromacs.make_ndx tool.
  • string
step27_gmx_cluster_output_pdb_path Output file Path to the output cluster file.
  • string
step27_gmx_cluster_config Config file Configuration file for biobb_analysis.gmx_cluster tool.
  • string
step28_cpptraj_rms_output_cpptraj_path Output file Path to the output processed analysis.
  • string
step28_cpptraj_rms_output_traj_path Output file Path to the output processed trajectory.
  • string
step28_cpptraj_rms_config Config file Configuration file for biobb_analysis.cpptraj_rms tool.
  • string
step29_pcz_zip_output_pcz_path Output file Output compressed trajectory.
  • string
step29_pcz_zip_config Config file Configuration file for biobb_flexserv.pcz_zip tool.
  • string
step30_pcz_zip_output_pcz_path Output file Output compressed trajectory.
  • string
step30_pcz_zip_config Config file Configuration file for biobb_flexserv.pcz_zip tool.
  • string
step31_pcz_info_output_json_path Output file Output json file with PCA info such as number of components, variance and dimensionality.
  • string
step32_pcz_evecs_output_json_path Output file Output json file with PCA Eigen Vectors.
  • string
step32_pcz_evecs_config Config file Configuration file for biobb_flexserv.pcz_evecs tool.
  • string
step33_pcz_animate_output_crd_path Output file Output PCA animated trajectory file.
  • string
step33_pcz_animate_config Config file Configuration file for biobb_flexserv.pcz_animate tool.
  • string
step34_cpptraj_convert_output_cpptraj_path Output file Path to the output processed trajectory.
  • string
step34_cpptraj_convert_config Config file Configuration file for biobb_analysis.cpptraj_convert tool.
  • string
step35_pcz_bfactor_output_dat_path Output file Output Bfactor x residue x PCA mode file.
  • string
step35_pcz_bfactor_output_pdb_path Output file Output PDB with Bfactor x residue x PCA mode file.
  • string
step35_pcz_bfactor_config Config file Configuration file for biobb_flexserv.pcz_bfactor tool.
  • string
step36_pcz_hinges_output_json_path Output file Output hinge regions x PCA mode file.
  • string
step36_pcz_hinges_config Config file Configuration file for biobb_flexserv.pcz_hinges tool.
  • string
step37_pcz_hinges_output_json_path Output file Output hinge regions x PCA mode file.
  • string
step37_pcz_hinges_config Config file Configuration file for biobb_flexserv.pcz_hinges tool.
  • string
step38_pcz_hinges_output_json_path Output file Output hinge regions x PCA mode file.
  • string
step38_pcz_hinges_config Config file Configuration file for biobb_flexserv.pcz_hinges tool.
  • string
step39_pcz_stiffness_output_json_path Output file Output json file with PCA Stiffness.
  • string
step39_pcz_stiffness_config Config file Configuration file for biobb_flexserv.pcz_stiffness tool.
  • string
step40_pcz_collectivity_output_json_path Output file Output json file with PCA Collectivity indexes per mode.
  • string
step40_pcz_collectivity_config Config file Configuration file for biobb_flexserv.pcz_collectivity tool.
  • string

Steps

ID Name Description
step0_extract_model extract_model This class is a wrapper of the Structure Checking tool to extract a model from a 3D structure.
step1_extract_chain extract_chain This class is a wrapper of the Structure Checking tool to extract a chain from a 3D structure.
step2_cpptraj_mask cpptraj_mask Wrapper of the Ambertools Cpptraj module for extracting a selection of atoms from a given cpptraj compatible trajectory.
step3_cpptraj_mask cpptraj_mask Wrapper of the Ambertools Cpptraj module for extracting a selection of atoms from a given cpptraj compatible trajectory.
step4_concoord_dist concoord_dist Wrapper of the Concoord_dist software.
step5_concoord_disco concoord_disco Wrapper of the Concoord_disco software.
step6_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step7_cpptraj_convert cpptraj_convert Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step8_prody_anm prody_anm Wrapper of the Prody software.
step9_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step10_cpptraj_convert cpptraj_convert Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step11_bd_run bd_run Run Brownian Dynamics from FlexServ
step12_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step13_dmd_run dmd_run Run Discrete Molecular Dynamics from FlexServ
step14_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step15_nma_run nma_run Run Normal Mode Analysis from FlexServ
step16_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step17_cpptraj_convert cpptraj_convert Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step18_nolb_nma nolb_nma Wrapper of the Nolb software.
step19_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step20_cpptraj_convert cpptraj_convert Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step21_imod_imode imod_imode Wrapper of the imods_imode software.
step22_imod_imc imod_imc Wrapper of the imods_imc software.
step23_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step24_cpptraj_convert cpptraj_convert Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step26_make_ndx make_ndx Creates a GROMACS index file (NDX) from an input selection and an input GROMACS structure file.
step27_gmx_cluster gmx_cluster Wrapper of the GROMACS cluster module for clustering structures from a given GROMACS compatible trajectory.
step28_cpptraj_rms cpptraj_rms Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory.
step29_pcz_zip pcz_zip Compress MD simulation trajectories with PCA suite
step30_pcz_zip pcz_zip Compress MD simulation trajectories with PCA suite
step31_pcz_info pcz_info Extract PCA info (variance, Dimensionality) from a compressed PCZ file
step32_pcz_evecs pcz_evecs Extract PCA Eigen Vectors from a compressed PCZ file
step33_pcz_animate pcz_animate Extract PCA animations from a compressed PCZ file
step34_cpptraj_convert cpptraj_convert Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames.
step35_pcz_bfactor pcz_bfactor Extract residue bfactors x PCA mode from a compressed PCZ file
step36_pcz_hinges pcz_hinges Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file
step37_pcz_hinges pcz_hinges Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file
step38_pcz_hinges pcz_hinges Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file
step39_pcz_stiffness pcz_stiffness Extract PCA stiffness from a compressed PCZ file
step40_pcz_collectivity pcz_collectivity Extract PCA collectivity (numerical measure of how many atoms are affected by a given mode) from a compressed PCZ file

Outputs

ID Name Description Type
step0_extract_model_out1 output_structure_path Output structure file path.
  • File
step1_extract_chain_out1 output_structure_path Output structure file path.
  • File
step2_cpptraj_mask_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step3_cpptraj_mask_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step4_concoord_dist_out1 output_pdb_path Output pdb file.
  • File
step4_concoord_dist_out2 output_gro_path Output gro file.
  • File
step4_concoord_dist_out3 output_dat_path Output dat with structure interpretation and bond definitions.
  • File
step5_concoord_disco_out1 output_traj_path Output trajectory file.
  • File
step5_concoord_disco_out2 output_rmsd_path Output rmsd file.
  • File
step5_concoord_disco_out3 output_bfactor_path Output B-factor file.
  • File
step6_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step7_cpptraj_convert_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step8_prody_anm_out1 output_pdb_path Output multi-model PDB file with the generated ensemble.
  • File
step9_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step10_cpptraj_convert_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step11_bd_run_out1 output_crd_path Output ensemble.
  • File
step11_bd_run_out2 output_log_path Output log file.
  • File
step12_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step12_cpptraj_rms_out2 output_traj_path Path to the output processed trajectory.
  • File
step13_dmd_run_out1 output_crd_path Output ensemble.
  • File
step13_dmd_run_out2 output_log_path Output log file.
  • File
step14_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step14_cpptraj_rms_out2 output_traj_path Path to the output processed trajectory.
  • File
step15_nma_run_out1 output_crd_path Output ensemble.
  • File
step15_nma_run_out2 output_log_path Output log file.
  • File
step16_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step17_cpptraj_convert_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step18_nolb_nma_out1 output_pdb_path Output multi-model PDB file with the generated ensemble.
  • File
step19_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step20_cpptraj_convert_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step21_imod_imode_out1 output_dat_path Output dat with normal modes.
  • File
step22_imod_imc_out1 output_traj_path Output multi-model PDB file with the generated ensemble.
  • File
step23_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step24_cpptraj_convert_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step26_make_ndx_out1 output_ndx_path Path to the output index NDX file.
  • File
step27_gmx_cluster_out1 output_pdb_path Path to the output cluster file.
  • File
step28_cpptraj_rms_out1 output_cpptraj_path Path to the output processed analysis.
  • File
step28_cpptraj_rms_out2 output_traj_path Path to the output processed trajectory.
  • File
step29_pcz_zip_out1 output_pcz_path Output compressed trajectory.
  • File
step30_pcz_zip_out1 output_pcz_path Output compressed trajectory.
  • File
step31_pcz_info_out1 output_json_path Output json file with PCA info such as number of components, variance and dimensionality.
  • File
step32_pcz_evecs_out1 output_json_path Output json file with PCA Eigen Vectors.
  • File
step33_pcz_animate_out1 output_crd_path Output PCA animated trajectory file.
  • File
step34_cpptraj_convert_out1 output_cpptraj_path Path to the output processed trajectory.
  • File
step35_pcz_bfactor_out1 output_dat_path Output Bfactor x residue x PCA mode file.
  • File
step35_pcz_bfactor_out2 output_pdb_path Output PDB with Bfactor x residue x PCA mode file.
  • File
step36_pcz_hinges_out1 output_json_path Output hinge regions x PCA mode file.
  • File
step37_pcz_hinges_out1 output_json_path Output hinge regions x PCA mode file.
  • File
step38_pcz_hinges_out1 output_json_path Output hinge regions x PCA mode file.
  • File
step39_pcz_stiffness_out1 output_json_path Output json file with PCA Stiffness.
  • File
step40_pcz_collectivity_out1 output_json_path Output json file with PCA Collectivity indexes per mode.
  • File

Version History

Version 2 (latest) Created 6th Jun 2023 at 11:10 by Genís Bayarri

Updated workflow descriptors


Frozen Version-2 70eb3d2

Version 1 (earliest) Created 31st May 2023 at 14:51 by Genís Bayarri

Initial commit


Frozen Version-1 ab62bfd
help Creators and Submitter
Citation
Hospital, A., & Bayarri, G. (2023). CWL Protein conformational ensembles generation. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.488.2
License
Other (Open)
Activity

Views: 739

Created: 31st May 2023 at 14:51

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 374 KB

Brought to you by:

Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH