High-Performance Computing (HPC) environments are integral to quantum chemistry and computationally intense research, yet their complexity poses challenges for non-HPC experts. Navigating these environments proves challenging for researchers lacking extensive computational knowledge, hindering efficient use of domain specific research software. The prediction of mass spectra for in silico annotation is therefore inaccessible for many wet lab scientists. Our main goal is to facilitate non-experts in HPC navigate this complexity and make semi-empirical Quantum Chemistry (QC)-based predictions available without needing advanced computational skills. To address this challenge, a comprehensive approach is proposed. We chose specific file formats for storing molecular structures, ensuring compatibility across diverse tools and platforms. The xTB quantum chemistry package for molecular geometry optimization is leveraged for its capability to balance between accuracy and computational cost, making it well-suited for non-HPC focused applications. Integrating QC-based Mass Spectrometry (QCxMS) into Galaxy enables the prediction of mass spectra and offers insights into molecular composition and properties. Our workflow demonstrates the utility of computing spectra using QCxMS along with complementary tools. We also present details of runtime performance metrics for four distinct molecules. This work highlights how non-HPC users can execute these predictions with ease, without requiring advanced computational skills. Additionally, a Docker image is created to encapsulate necessary tools, accompanied by user-friendly wrappers, simplifying the entire process for non-expert users. Within this context, potential improvements are considered, focusing on improving the Conda package for better performance by incorporating Fortran and Intel compiler optimizations. These considerations play a crucial role in refining the proposed methodology, enhancing user experience, and expanding the reach of semi-empirical predictions in quantum chemistry for mass spectra predictions
Inputs
ID | Name | Description | Type |
---|---|---|---|
Input Molecules with SMILES and NAME without a header. | Input Molecules with SMILES and NAME without a header. | First column should containe the name of the molecule, the second should contain the SMILES code. |
|
Number of conformers to generate | Number of conformers to generate | By default one conformer |
|
Optimization Levels | Optimization Levels | Level of accuracy for the optimization |
|
QC Method | QC Method | Available: GFN1-xTB and GFN2-xTB |
|
Steps
ID | Name | Description |
---|---|---|
4 | Cut SMILES column | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/9.3+galaxy1 |
5 | Cut NAME column | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/9.3+galaxy1 |
6 | Split file | toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2 |
7 | Split file | toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.2 |
8 | Parse parameter value | param_value_from_file |
9 | Convert compounds from SMILES to SDF and add the name as title. | toolshed.g2.bx.psu.edu/repos/bgruening/openbabel_compound_convert/openbabel_compound_convert/3.1.1+galaxy0 |
10 | Merge the individual SDF files | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cat/9.3+galaxy1 |
11 | Generate conformers | Generate 3D conformers from SDF input for each molecule. It requires the number of conformers as an input parameter. Default parameters value is 1. toolshed.g2.bx.psu.edu/repos/bgruening/ctb_im_conformers/ctb_im_conformers/1.1.4+galaxy0 |
12 | Molecular format conversion | Convert the conformer to cartesian coordinate (XYZ) format toolshed.g2.bx.psu.edu/repos/bgruening/openbabel_compound_convert/openbabel_compound_convert/3.1.1+galaxy0 |
13 | xTB molecular optimization | Semi-empirical optimization toolshed.g2.bx.psu.edu/repos/recetox/xtb_molecular_optimization/xtb_molecular_optimization/6.6.1+galaxy3 |
14 | QCxMS neutral run | Produce preparation input files for production runs toolshed.g2.bx.psu.edu/repos/recetox/qcxms_neutral_run/qcxms_neutral_run/5.2.1+galaxy4 |
15 | QCxMS production run | Calculate the mass spectra for a given molecule using QCxMS. It generates .res files, which are collected and converted into MSP format in the last step toolshed.g2.bx.psu.edu/repos/recetox/qcxms_production_run/qcxms_production_run/5.2.1+galaxy3 |
16 | Filter failed datasets | Remove failed runs __FILTER_FAILED_DATASETS__ |
17 | QCxMS get results | Produce simulated mass spectra into MSP file format. toolshed.g2.bx.psu.edu/repos/recetox/qcxms_getres/qcxms_getres/5.2.1+galaxy2 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
conformer_output | conformer_output | n/a |
|
XYZ output | XYZ output | n/a |
|
optimized output | optimized output | n/a |
|
[.in] output | [.in] output | n/a |
|
[.start] output | [.start] output | n/a |
|
[.xyz] output | [.xyz] output | n/a |
|
res output | res output | n/a |
|
MSP output | MSP output | n/a |
|
Version History
Galaxy Workflow End-to-end EI mass spectra prediction workflow using QCxMS (latest) Created 5th Aug 2024 at 14:53 by Helge Hecht
New version starting from a table with SMILES and NAMES to generate an SDF and then run the previous workflow.
Frozen
Galaxy-Workflow-End-to-end-EI-mass-spectra-prediction-workflow-using-QCxMS
1a50bdb
Version 1 (earliest) Created 3rd Jun 2024 at 14:52 by Wudmir Rojas
qcxms galaxy workflow
Frozen
Version-1
0007a6a
Creators
Additional credit
RECETOX SpecDat
Submitter
Views: 2048 Downloads: 258 Runs: 3
Created: 3rd Jun 2024 at 14:52
Last updated: 5th Aug 2024 at 15:12
None