Protein Conformational ensembles generation
Workflow included in the ELIXIR 3D-Bioinfo Implementation Study:
Building on PDBe-KB to chart and characterize the conformation landscape of native proteins
This tutorial aims to illustrate the process of generating protein conformational ensembles from** 3D structures **and analysing its molecular flexibility, step by step, using the BioExcel Building Blocks library (biobb).
Conformational landscape of native proteins
Proteins are dynamic systems that adopt multiple conformational states, a property essential for many biological processes (e.g. binding other proteins, nucleic acids, small molecule ligands, or switching between functionaly active and inactive states). Characterizing the different conformational states of proteins and the transitions between them is therefore critical for gaining insight into their biological function and can help explain the effects of genetic variants in health and disease and the action of drugs.
Structural biology has become increasingly efficient in sampling the different conformational states of proteins. The PDB has currently archived more than 170,000 individual structures, but over two thirds of these structures represent multiple conformations of the same or related protein, observed in different crystal forms, when interacting with other proteins or other macromolecules, or upon binding small molecule ligands. Charting this conformational diversity across the PDB can therefore be employed to build a useful approximation of the conformational landscape of native proteins.
A number of resources and tools describing and characterizing various often complementary aspects of protein conformational diversity in known structures have been developed, notably by groups in Europe. These tools include algorithms with varying degree of sophistication, for aligning the 3D structures of individual protein chains or domains, of protein assemblies, and evaluating their degree of structural similarity. Using such tools one can align structures pairwise, compute the corresponding similarity matrix, and identify ensembles of structures/conformations with a defined similarity level that tend to recur in different PDB entries, an operation typically performed using clustering methods. Such workflows are at the basis of resources such as CATH, Contemplate, or PDBflex that offer access to conformational ensembles comprised of similar conformations clustered according to various criteria. Other types of tools focus on differences between protein conformations, identifying regions of proteins that undergo large collective displacements in different PDB entries, those that act as hinges or linkers, or regions that are inherently flexible.
To build a meaningful approximation of the conformational landscape of native proteins, the conformational ensembles (and the differences between them), identified on the basis of structural similarity/dissimilarity measures alone, need to be biophysically characterized. This may be approached at two different levels.
-
At the biological level, it is important to link observed conformational ensembles, to their functional roles by evaluating the correspondence with protein family classifications based on sequence information and functional annotations in public databases e.g. Uniprot, PDKe-Knowledge Base (KB). These links should provide valuable mechanistic insights into how the conformational and dynamic properties of proteins are exploited by evolution to regulate their biological function.
-
At the physical level one needs to introduce energetic consideration to evaluate the likelihood that the identified conformational ensembles represent conformational states that the protein (or domain under study) samples in isolation. Such evaluation is notoriously challenging and can only be roughly approximated by using computational methods to evaluate the extent to which the observed conformational ensembles can be reproduced by algorithms that simulate the dynamic behavior of protein systems. These algorithms include the computationally expensive classical molecular dynamics (MD) simulations to sample local thermal fluctuations but also faster more approximate methods such as Elastic Network Models and Normal Node Analysis (NMA) to model low energy collective motions. Alternatively, enhanced sampling molecular dynamics can be used to model complex types of conformational changes but at a very high computational cost.
The ELIXIR 3D-Bioinfo Implementation Study Building on PDBe-KB to chart and characterize the conformation landscape of native proteins focuses on:
- Mapping the conformational diversity of proteins and their homologs across the PDB.
- Characterize the different flexibility properties of protein regions, and link this information to sequence and functional annotation.
- Benchmark computational methods that can predict a biophysical description of protein motions.
This notebook is part of the third objective, where a list of computational resources that are able to predict protein flexibility and conformational ensembles have been collected, evaluated, and integrated in reproducible and interoperable workflows using the BioExcel Building Blocks library. Note that the list is not meant to be exhaustive, it is built following the expertise of the implementation study partners.
Copyright & Licensing
This software has been developed in the MMB group at the BSC & IRB for the European BioExcel, funded by the European Commission (EU H2020 823830, EU H2020 675728).
- (c) 2015-2023 Barcelona Supercomputing Center
- (c) 2015-2023 Institute for Research in Biomedicine
Licensed under the Apache License 2.0, see the file LICENSE for details.
Inputs
ID | Name | Description | Type |
---|---|---|---|
step0_extract_model_input_structure_path | Input file | Input structure file path. |
|
step0_extract_model_output_structure_path | Output file | Output structure file path. |
|
step0_extract_model_config | Config file | Configuration file for biobb_structure_utils.extract_model tool. |
|
step1_extract_chain_output_structure_path | Output file | Output structure file path. |
|
step1_extract_chain_config | Config file | Configuration file for biobb_structure_utils.extract_chain tool. |
|
step2_cpptraj_mask_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step2_cpptraj_mask_config | Config file | Configuration file for biobb_analysis.cpptraj_mask tool. |
|
step3_cpptraj_mask_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step3_cpptraj_mask_config | Config file | Configuration file for biobb_analysis.cpptraj_mask tool. |
|
step4_concoord_dist_output_pdb_path | Output file | Output pdb file. |
|
step4_concoord_dist_output_gro_path | Output file | Output gro file. |
|
step4_concoord_dist_output_dat_path | Output file | Output dat with structure interpretation and bond definitions. |
|
step4_concoord_dist_config | Config file | Configuration file for biobb_flexdyn.concoord_dist tool. |
|
step5_concoord_disco_output_traj_path | Output file | Output trajectory file. |
|
step5_concoord_disco_output_rmsd_path | Output file | Output rmsd file. |
|
step5_concoord_disco_output_bfactor_path | Output file | Output B-factor file. |
|
step5_concoord_disco_config | Config file | Configuration file for biobb_flexdyn.concoord_disco tool. |
|
step6_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step6_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step7_cpptraj_convert_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step7_cpptraj_convert_config | Config file | Configuration file for biobb_analysis.cpptraj_convert tool. |
|
step8_prody_anm_output_pdb_path | Output file | Output multi-model PDB file with the generated ensemble. |
|
step8_prody_anm_config | Config file | Configuration file for biobb_flexdyn.prody_anm tool. |
|
step9_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step9_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step10_cpptraj_convert_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step10_cpptraj_convert_config | Config file | Configuration file for biobb_analysis.cpptraj_convert tool. |
|
step11_bd_run_output_crd_path | Output file | Output ensemble. |
|
step11_bd_run_output_log_path | Output file | Output log file. |
|
step11_bd_run_config | Config file | Configuration file for biobb_flexserv.bd_run tool. |
|
step12_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step12_cpptraj_rms_output_traj_path | Output file | Path to the output processed trajectory. |
|
step12_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step13_dmd_run_output_crd_path | Output file | Output ensemble. |
|
step13_dmd_run_output_log_path | Output file | Output log file. |
|
step13_dmd_run_config | Config file | Configuration file for biobb_flexserv.dmd_run tool. |
|
step14_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step14_cpptraj_rms_output_traj_path | Output file | Path to the output processed trajectory. |
|
step14_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step15_nma_run_output_crd_path | Output file | Output ensemble. |
|
step15_nma_run_output_log_path | Output file | Output log file. |
|
step15_nma_run_config | Config file | Configuration file for biobb_flexserv.nma_run tool. |
|
step16_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step16_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step17_cpptraj_convert_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step17_cpptraj_convert_config | Config file | Configuration file for biobb_analysis.cpptraj_convert tool. |
|
step18_nolb_nma_output_pdb_path | Output file | Output multi-model PDB file with the generated ensemble. |
|
step18_nolb_nma_config | Config file | Configuration file for biobb_flexdyn.nolb_nma tool. |
|
step19_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step19_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step20_cpptraj_convert_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step20_cpptraj_convert_config | Config file | Configuration file for biobb_analysis.cpptraj_convert tool. |
|
step21_imod_imode_output_dat_path | Output file | Output dat with normal modes. |
|
step21_imod_imode_config | Config file | Configuration file for biobb_flexdyn.imod_imode tool. |
|
step22_imod_imc_output_traj_path | Output file | Output multi-model PDB file with the generated ensemble. |
|
step22_imod_imc_config | Config file | Configuration file for biobb_flexdyn.imod_imc tool. |
|
step23_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step23_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step24_cpptraj_convert_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step24_cpptraj_convert_config | Config file | Configuration file for biobb_analysis.cpptraj_convert tool. |
|
step26_make_ndx_output_ndx_path | Output file | Path to the output index NDX file. |
|
step26_make_ndx_config | Config file | Configuration file for biobb_gromacs.make_ndx tool. |
|
step27_gmx_cluster_output_pdb_path | Output file | Path to the output cluster file. |
|
step27_gmx_cluster_config | Config file | Configuration file for biobb_analysis.gmx_cluster tool. |
|
step28_cpptraj_rms_output_cpptraj_path | Output file | Path to the output processed analysis. |
|
step28_cpptraj_rms_output_traj_path | Output file | Path to the output processed trajectory. |
|
step28_cpptraj_rms_config | Config file | Configuration file for biobb_analysis.cpptraj_rms tool. |
|
step29_pcz_zip_output_pcz_path | Output file | Output compressed trajectory. |
|
step29_pcz_zip_config | Config file | Configuration file for biobb_flexserv.pcz_zip tool. |
|
step30_pcz_zip_output_pcz_path | Output file | Output compressed trajectory. |
|
step30_pcz_zip_config | Config file | Configuration file for biobb_flexserv.pcz_zip tool. |
|
step31_pcz_info_output_json_path | Output file | Output json file with PCA info such as number of components, variance and dimensionality. |
|
step32_pcz_evecs_output_json_path | Output file | Output json file with PCA Eigen Vectors. |
|
step32_pcz_evecs_config | Config file | Configuration file for biobb_flexserv.pcz_evecs tool. |
|
step33_pcz_animate_output_crd_path | Output file | Output PCA animated trajectory file. |
|
step33_pcz_animate_config | Config file | Configuration file for biobb_flexserv.pcz_animate tool. |
|
step34_cpptraj_convert_output_cpptraj_path | Output file | Path to the output processed trajectory. |
|
step34_cpptraj_convert_config | Config file | Configuration file for biobb_analysis.cpptraj_convert tool. |
|
step35_pcz_bfactor_output_dat_path | Output file | Output Bfactor x residue x PCA mode file. |
|
step35_pcz_bfactor_output_pdb_path | Output file | Output PDB with Bfactor x residue x PCA mode file. |
|
step35_pcz_bfactor_config | Config file | Configuration file for biobb_flexserv.pcz_bfactor tool. |
|
step36_pcz_hinges_output_json_path | Output file | Output hinge regions x PCA mode file. |
|
step36_pcz_hinges_config | Config file | Configuration file for biobb_flexserv.pcz_hinges tool. |
|
step37_pcz_hinges_output_json_path | Output file | Output hinge regions x PCA mode file. |
|
step37_pcz_hinges_config | Config file | Configuration file for biobb_flexserv.pcz_hinges tool. |
|
step38_pcz_hinges_output_json_path | Output file | Output hinge regions x PCA mode file. |
|
step38_pcz_hinges_config | Config file | Configuration file for biobb_flexserv.pcz_hinges tool. |
|
step39_pcz_stiffness_output_json_path | Output file | Output json file with PCA Stiffness. |
|
step39_pcz_stiffness_config | Config file | Configuration file for biobb_flexserv.pcz_stiffness tool. |
|
step40_pcz_collectivity_output_json_path | Output file | Output json file with PCA Collectivity indexes per mode. |
|
step40_pcz_collectivity_config | Config file | Configuration file for biobb_flexserv.pcz_collectivity tool. |
|
Steps
ID | Name | Description |
---|---|---|
step0_extract_model | extract_model | This class is a wrapper of the Structure Checking tool to extract a model from a 3D structure. |
step1_extract_chain | extract_chain | This class is a wrapper of the Structure Checking tool to extract a chain from a 3D structure. |
step2_cpptraj_mask | cpptraj_mask | Wrapper of the Ambertools Cpptraj module for extracting a selection of atoms from a given cpptraj compatible trajectory. |
step3_cpptraj_mask | cpptraj_mask | Wrapper of the Ambertools Cpptraj module for extracting a selection of atoms from a given cpptraj compatible trajectory. |
step4_concoord_dist | concoord_dist | Wrapper of the Concoord_dist software. |
step5_concoord_disco | concoord_disco | Wrapper of the Concoord_disco software. |
step6_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step7_cpptraj_convert | cpptraj_convert | Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames. |
step8_prody_anm | prody_anm | Wrapper of the Prody software. |
step9_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step10_cpptraj_convert | cpptraj_convert | Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames. |
step11_bd_run | bd_run | Run Brownian Dynamics from FlexServ |
step12_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step13_dmd_run | dmd_run | Run Discrete Molecular Dynamics from FlexServ |
step14_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step15_nma_run | nma_run | Run Normal Mode Analysis from FlexServ |
step16_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step17_cpptraj_convert | cpptraj_convert | Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames. |
step18_nolb_nma | nolb_nma | Wrapper of the Nolb software. |
step19_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step20_cpptraj_convert | cpptraj_convert | Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames. |
step21_imod_imode | imod_imode | Wrapper of the imods_imode software. |
step22_imod_imc | imod_imc | Wrapper of the imods_imc software. |
step23_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step24_cpptraj_convert | cpptraj_convert | Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames. |
step26_make_ndx | make_ndx | Creates a GROMACS index file (NDX) from an input selection and an input GROMACS structure file. |
step27_gmx_cluster | gmx_cluster | Wrapper of the GROMACS cluster module for clustering structures from a given GROMACS compatible trajectory. |
step28_cpptraj_rms | cpptraj_rms | Wrapper of the Ambertools Cpptraj module for calculating the Root Mean Square deviation (RMSd) of a given cpptraj compatible trajectory. |
step29_pcz_zip | pcz_zip | Compress MD simulation trajectories with PCA suite |
step30_pcz_zip | pcz_zip | Compress MD simulation trajectories with PCA suite |
step31_pcz_info | pcz_info | Extract PCA info (variance, Dimensionality) from a compressed PCZ file |
step32_pcz_evecs | pcz_evecs | Extract PCA Eigen Vectors from a compressed PCZ file |
step33_pcz_animate | pcz_animate | Extract PCA animations from a compressed PCZ file |
step34_cpptraj_convert | cpptraj_convert | Wrapper of the Ambertools Cpptraj module for converting between cpptraj compatible trajectory file formats and/or extracting a selection of atoms or frames. |
step35_pcz_bfactor | pcz_bfactor | Extract residue bfactors x PCA mode from a compressed PCZ file |
step36_pcz_hinges | pcz_hinges | Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file |
step37_pcz_hinges | pcz_hinges | Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file |
step38_pcz_hinges | pcz_hinges | Compute possible hinge regions (residues around which large protein movements are organized) of a molecule from a compressed PCZ file |
step39_pcz_stiffness | pcz_stiffness | Extract PCA stiffness from a compressed PCZ file |
step40_pcz_collectivity | pcz_collectivity | Extract PCA collectivity (numerical measure of how many atoms are affected by a given mode) from a compressed PCZ file |
Outputs
ID | Name | Description | Type |
---|---|---|---|
step0_extract_model_out1 | output_structure_path | Output structure file path. |
|
step1_extract_chain_out1 | output_structure_path | Output structure file path. |
|
step2_cpptraj_mask_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step3_cpptraj_mask_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step4_concoord_dist_out1 | output_pdb_path | Output pdb file. |
|
step4_concoord_dist_out2 | output_gro_path | Output gro file. |
|
step4_concoord_dist_out3 | output_dat_path | Output dat with structure interpretation and bond definitions. |
|
step5_concoord_disco_out1 | output_traj_path | Output trajectory file. |
|
step5_concoord_disco_out2 | output_rmsd_path | Output rmsd file. |
|
step5_concoord_disco_out3 | output_bfactor_path | Output B-factor file. |
|
step6_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step7_cpptraj_convert_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step8_prody_anm_out1 | output_pdb_path | Output multi-model PDB file with the generated ensemble. |
|
step9_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step10_cpptraj_convert_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step11_bd_run_out1 | output_crd_path | Output ensemble. |
|
step11_bd_run_out2 | output_log_path | Output log file. |
|
step12_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step12_cpptraj_rms_out2 | output_traj_path | Path to the output processed trajectory. |
|
step13_dmd_run_out1 | output_crd_path | Output ensemble. |
|
step13_dmd_run_out2 | output_log_path | Output log file. |
|
step14_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step14_cpptraj_rms_out2 | output_traj_path | Path to the output processed trajectory. |
|
step15_nma_run_out1 | output_crd_path | Output ensemble. |
|
step15_nma_run_out2 | output_log_path | Output log file. |
|
step16_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step17_cpptraj_convert_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step18_nolb_nma_out1 | output_pdb_path | Output multi-model PDB file with the generated ensemble. |
|
step19_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step20_cpptraj_convert_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step21_imod_imode_out1 | output_dat_path | Output dat with normal modes. |
|
step22_imod_imc_out1 | output_traj_path | Output multi-model PDB file with the generated ensemble. |
|
step23_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step24_cpptraj_convert_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step26_make_ndx_out1 | output_ndx_path | Path to the output index NDX file. |
|
step27_gmx_cluster_out1 | output_pdb_path | Path to the output cluster file. |
|
step28_cpptraj_rms_out1 | output_cpptraj_path | Path to the output processed analysis. |
|
step28_cpptraj_rms_out2 | output_traj_path | Path to the output processed trajectory. |
|
step29_pcz_zip_out1 | output_pcz_path | Output compressed trajectory. |
|
step30_pcz_zip_out1 | output_pcz_path | Output compressed trajectory. |
|
step31_pcz_info_out1 | output_json_path | Output json file with PCA info such as number of components, variance and dimensionality. |
|
step32_pcz_evecs_out1 | output_json_path | Output json file with PCA Eigen Vectors. |
|
step33_pcz_animate_out1 | output_crd_path | Output PCA animated trajectory file. |
|
step34_cpptraj_convert_out1 | output_cpptraj_path | Path to the output processed trajectory. |
|
step35_pcz_bfactor_out1 | output_dat_path | Output Bfactor x residue x PCA mode file. |
|
step35_pcz_bfactor_out2 | output_pdb_path | Output PDB with Bfactor x residue x PCA mode file. |
|
step36_pcz_hinges_out1 | output_json_path | Output hinge regions x PCA mode file. |
|
step37_pcz_hinges_out1 | output_json_path | Output hinge regions x PCA mode file. |
|
step38_pcz_hinges_out1 | output_json_path | Output hinge regions x PCA mode file. |
|
step39_pcz_stiffness_out1 | output_json_path | Output json file with PCA Stiffness. |
|
step40_pcz_collectivity_out1 | output_json_path | Output json file with PCA Collectivity indexes per mode. |
|
Version History
Version 2 (latest) Created 6th Jun 2023 at 11:10 by Genís Bayarri
Updated workflow descriptors
Frozen
Version-2
70eb3d2
Version 1 (earliest) Created 31st May 2023 at 14:51 by Genís Bayarri
Initial commit
Frozen
Version-1
ab62bfd

Creators
Submitter
Views: 739
Created: 31st May 2023 at 14:51

This item has not yet been tagged.

None