Galaxy Protein conformational ensembles generation
Version 1

Workflow Type: Galaxy
Stable

Protein Conformational ensembles generation

Workflow included in the ELIXIR 3D-Bioinfo Implementation Study:

Building on PDBe-KB to chart and characterize the conformation landscape of native proteins

This tutorial aims to illustrate the process of generating protein conformational ensembles from** 3D structures **and analysing its molecular flexibility, step by step, using the BioExcel Building Blocks library (biobb).

Conformational landscape of native proteins

Proteins are dynamic systems that adopt multiple conformational states, a property essential for many biological processes (e.g. binding other proteins, nucleic acids, small molecule ligands, or switching between functionaly active and inactive states). Characterizing the different conformational states of proteins and the transitions between them is therefore critical for gaining insight into their biological function and can help explain the effects of genetic variants in health and disease and the action of drugs.

Structural biology has become increasingly efficient in sampling the different conformational states of proteins. The PDB has currently archived more than 170,000 individual structures, but over two thirds of these structures represent multiple conformations of the same or related protein, observed in different crystal forms, when interacting with other proteins or other macromolecules, or upon binding small molecule ligands. Charting this conformational diversity across the PDB can therefore be employed to build a useful approximation of the conformational landscape of native proteins.

A number of resources and tools describing and characterizing various often complementary aspects of protein conformational diversity in known structures have been developed, notably by groups in Europe. These tools include algorithms with varying degree of sophistication, for aligning the 3D structures of individual protein chains or domains, of protein assemblies, and evaluating their degree of structural similarity. Using such tools one can align structures pairwise, compute the corresponding similarity matrix, and identify ensembles of structures/conformations with a defined similarity level that tend to recur in different PDB entries, an operation typically performed using clustering methods. Such workflows are at the basis of resources such as CATH, Contemplate, or PDBflex that offer access to conformational ensembles comprised of similar conformations clustered according to various criteria. Other types of tools focus on differences between protein conformations, identifying regions of proteins that undergo large collective displacements in different PDB entries, those that act as hinges or linkers, or regions that are inherently flexible.

To build a meaningful approximation of the conformational landscape of native proteins, the conformational ensembles (and the differences between them), identified on the basis of structural similarity/dissimilarity measures alone, need to be biophysically characterized. This may be approached at two different levels.

  • At the biological level, it is important to link observed conformational ensembles, to their functional roles by evaluating the correspondence with protein family classifications based on sequence information and functional annotations in public databases e.g. Uniprot, PDKe-Knowledge Base (KB). These links should provide valuable mechanistic insights into how the conformational and dynamic properties of proteins are exploited by evolution to regulate their biological function.

  • At the physical level one needs to introduce energetic consideration to evaluate the likelihood that the identified conformational ensembles represent conformational states that the protein (or domain under study) samples in isolation. Such evaluation is notoriously challenging and can only be roughly approximated by using computational methods to evaluate the extent to which the observed conformational ensembles can be reproduced by algorithms that simulate the dynamic behavior of protein systems. These algorithms include the computationally expensive classical molecular dynamics (MD) simulations to sample local thermal fluctuations but also faster more approximate methods such as Elastic Network Models and Normal Node Analysis (NMA) to model low energy collective motions. Alternatively, enhanced sampling molecular dynamics can be used to model complex types of conformational changes but at a very high computational cost.

The ELIXIR 3D-Bioinfo Implementation Study Building on PDBe-KB to chart and characterize the conformation landscape of native proteins focuses on:

  1. Mapping the conformational diversity of proteins and their homologs across the PDB.
  2. Characterize the different flexibility properties of protein regions, and link this information to sequence and functional annotation.
  3. Benchmark computational methods that can predict a biophysical description of protein motions.

This notebook is part of the third objective, where a list of computational resources that are able to predict protein flexibility and conformational ensembles have been collected, evaluated, and integrated in reproducible and interoperable workflows using the BioExcel Building Blocks library. Note that the list is not meant to be exhaustive, it is built following the expertise of the implementation study partners.


Copyright & Licensing

This software has been developed in the MMB group at the BSC & IRB for the European BioExcel, funded by the European Commission (EU H2020 823830, EU H2020 675728).

Licensed under the Apache License 2.0, see the file LICENSE for details.

Steps

ID Name Description
0 Pdb biobb_io_pdb_ext
1 ExtractModel biobb_structure_utils_extract_model_ext
2 ExtractChain biobb_structure_utils_extract_chain_ext
3 Protein backbone biobb_analysis_cpptraj_mask_ext
4 ConcoordDist biobb_flexdyn_concoord_dist_ext
5 ProdyAnm biobb_flexdyn_prody_anm_ext
6 ImodImode biobb_flexdyn_imod_imode_ext
7 Protein CA biobb_analysis_cpptraj_mask_ext
8 ConcoordDisco biobb_flexdyn_concoord_disco_ext
9 CpptrajRms Prody biobb_analysis_cpptraj_rms_ext
10 CpptrajConvert Prody biobb_analysis_cpptraj_convert_ext
11 ImodImc biobb_flexdyn_imod_imc_ext
12 BdRun biobb_flexserv_bd_run_ext
13 DmdRun biobb_flexserv_dmd_run_ext
14 NolbNma biobb_flexdyn_nolb_nma_ext
15 MakeNdx biobb_gromacs_make_ndx_ext
16 NmaRun biobb_flexserv_nma_run_ext
17 CpptrajRms Concoord biobb_analysis_cpptraj_rms_ext
18 CpptrajConvert Concoord biobb_analysis_cpptraj_convert_ext
19 CpptrajRms Imod biobb_analysis_cpptraj_rms_ext
20 CpptrajConvert Imod biobb_analysis_cpptraj_convert_ext
21 CpptrajRms BD biobb_analysis_cpptraj_rms_ext
22 CpptrajRms DMD biobb_analysis_cpptraj_rms_ext
23 CpptrajRms NOLB biobb_analysis_cpptraj_rms_ext
24 CpptrajConvert NOLB biobb_analysis_cpptraj_convert_ext
25 CpptrajRms NMA biobb_analysis_cpptraj_rms_ext
26 CpptrajConvert NMA biobb_analysis_cpptraj_convert_ext
27 Zip toolshed.g2.bx.psu.edu/repos/cmonjeau/ziptool/zip/1.0.1
28 Trjcat biobb_gromacs_trjcat_ext
29 GmxCluster Ensemble biobb_analysis_gmx_cluster_ext
30 CpptrajRms Ensemble biobb_analysis_cpptraj_rms_ext
31 PczZip biobb_flexserv_pcz_zip_ext
32 PczZip Gaussian biobb_flexserv_pcz_zip_ext
33 PczInfo biobb_flexserv_pcz_info_ext
34 PczEvecs biobb_flexserv_pcz_evecs_ext
35 PczAnimate biobb_flexserv_pcz_animate_ext
36 PczBfactor biobb_flexserv_pcz_bfactor_ext
37 PczStiffness biobb_flexserv_pcz_stiffness_ext
38 PczCollectivity biobb_flexserv_pcz_collectivity_ext
39 PczHinges biobb_flexserv_pcz_hinges_ext
40 PczHinges biobb_flexserv_pcz_hinges_ext
41 PczHinges biobb_flexserv_pcz_hinges_ext
42 CpptrajConvert PCZ biobb_analysis_cpptraj_convert_ext

Outputs

ID Name Description Type
mypdb.pdb mypdb.pdb n/a
  • File
myextract_model.pdb myextract_model.pdb n/a
  • File
myextract_monomer.pdb myextract_monomer.pdb n/a
  • File
mycpptraj_mask_backbone mycpptraj_mask_backbone n/a
  • File
myconcoord_dist.dat myconcoord_dist.dat n/a
  • File
myconcoord_dist.gro myconcoord_dist.gro n/a
  • File
myconcoord_dist.pdb myconcoord_dist.pdb n/a
  • File
myprody_anm_traj.pdb myprody_anm_traj.pdb n/a
  • File
myimod_imode_evecs.dat myimod_imode_evecs.dat n/a
  • File
mycpptraj_mask_ca.pdb mycpptraj_mask_ca.pdb n/a
  • File
myconcoord_disco_rmsd.dat myconcoord_disco_rmsd.dat n/a
  • File
myconcoord_disco_bfactor.pdb myconcoord_disco_bfactor.pdb n/a
  • File
myconcoord_disco_traj.pdb myconcoord_disco_traj.pdb n/a
  • File
mycpptraj_prody_rms.dat mycpptraj_prody_rms.dat n/a
  • File
mycpptraj_prody_anm_traj.trr mycpptraj_prody_anm_traj.trr n/a
  • File
myimod_imc.pdb myimod_imc.pdb n/a
  • File
mybd_flexserv_bd_ensemble.mdcrd mybd_flexserv_bd_ensemble.mdcrd n/a
  • File
mybd_flexserv_bd_ensemble.log mybd_flexserv_bd_ensemble.log n/a
  • File
mydmd_flexserv_dmd_ensemble.mdcrd mydmd_flexserv_dmd_ensemble.mdcrd n/a
  • File
mydmd_flexserv_dmd_ensemble.log mydmd_flexserv_dmd_ensemble.log n/a
  • File
mynolb_ensemble.pdb mynolb_ensemble.pdb n/a
  • File
mymake_gmx_ndx.ndx mymake_gmx_ndx.ndx n/a
  • File
mynma_flexserv_nma_ensemble.log mynma_flexserv_nma_ensemble.log n/a
  • File
mynma_flexserv_nma_ensemble.mdcrd mynma_flexserv_nma_ensemble.mdcrd n/a
  • File
_anonymous_output_1 _anonymous_output_1 n/a
  • File
mycpptraj_disco_traj.trr mycpptraj_disco_traj.trr n/a
  • File
_anonymous_output_2 _anonymous_output_2 n/a
  • File
mycpptraj_imods_ensemble.trr mycpptraj_imods_ensemble.trr n/a
  • File
mycpptraj_flexserv_bd_rmsd.dat mycpptraj_flexserv_bd_rmsd.dat n/a
  • File
mycpptraj_flexserv_bd_traj_fitted.trr mycpptraj_flexserv_bd_traj_fitted.trr n/a
  • File
mycpptraj_flexserv_dmd_traj_fitted.trr mycpptraj_flexserv_dmd_traj_fitted.trr n/a
  • File
mycpptraj_flexserv_dmd_rmsd.dat mycpptraj_flexserv_dmd_rmsd.dat n/a
  • File
mycpptraj_nolb_rmsd.dat mycpptraj_nolb_rmsd.dat n/a
  • File
mycpptraj_nolb_ensemble.trr mycpptraj_nolb_ensemble.trr n/a
  • File
mycpptraj_flexserv_nma_rmsd.dat mycpptraj_flexserv_nma_rmsd.dat n/a
  • File
mycpptraj_flexserv_nma_ensemble.trr mycpptraj_flexserv_nma_ensemble.trr n/a
  • File
concat_traj.zip concat_traj.zip n/a
  • File
mytrjcat_concat_traj.trr mytrjcat_concat_traj.trr n/a
  • File
mygmx_cluster.xvg mygmx_cluster.xvg n/a
  • File
mygmx_cluster.log mygmx_cluster.log n/a
  • File
mygmx_concat.cluster.pdb mygmx_concat.cluster.pdb n/a
  • File
mygmx_cluster.xpm mygmx_cluster.xpm n/a
  • File
mycpptraj_meta_traj_rmsd.dat mycpptraj_meta_traj_rmsd.dat n/a
  • File
mycpptraj_meta_traj_fitted.crd mycpptraj_meta_traj_fitted.crd n/a
  • File
_anonymous_output_3 _anonymous_output_3 n/a
  • File
_anonymous_output_4 _anonymous_output_4 n/a
  • File
mypcz_report.json mypcz_report.json n/a
  • File
mypcz_evecs.json mypcz_evecs.json n/a
  • File
mypcz_proj1.crd mypcz_proj1.crd n/a
  • File
mypcz_bfactor_all.dat mypcz_bfactor_all.dat n/a
  • File
mypcz_bfactor_all.pdb mypcz_bfactor_all.pdb n/a
  • File
mypcz_stiffness.json mypcz_stiffness.json n/a
  • File
mypcz_collectivity.json mypcz_collectivity.json n/a
  • File
mypcz_hinges_bfactor_report.json mypcz_hinges_bfactor_report.json n/a
  • File
_anonymous_output_5 _anonymous_output_5 n/a
  • File
mypcz_hinges_fcte_report.json mypcz_hinges_fcte_report.json n/a
  • File
mycpptraj_pcz_proj1.dcd mycpptraj_pcz_proj1.dcd n/a
  • File

Version History

Version 1 (earliest) Created 1st Jun 2023 at 10:50 by Genís Bayarri

Initial commit


Frozen Version-1 aacca2b
help Creators and Submitter
Citation
Hospital, A., & Bayarri, G. (2023). Python Protein conformational ensembles generation. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.490.1
License
Other (Open)
Activity

Views: 1863

Created: 1st Jun 2023 at 10:50

Last updated: 1st Jun 2023 at 10:53

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 90.8 KB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH