Training a CNN model for classification of transcriptional subtypes and survival prediction in glioblastoma
Version 1

Workflow Type: Python


Work in progress... Predicting TS & risk from glioblastoma whole slide images


Upcoming paper: stay tuned...


python 3.7.7

randaugment by Khrystyna Faryna:

tensorflow 2.1.0

scikit-survival 0.13.1

pandas 1.0.3

lifelines 0.25.0


The pipeline implemented here predicts transcriptional subtypes and survival of glioblastoma patients based on H&E stained whole slide scans. Sample data is provided in this repository. To test the basic functionality with 5-fold-CV simply run (for survival) or (for transcriptional subtypes). Please note that this will not reproduce the results from the manuscript, as only a small fraction of the image data can be provided in this repository due to size constraints. In order to reproduce the results from the manuscript, please refer to the step by step guide below. The whole dataset can be accessed at If you wish to adopt this pipeline for your own use, please be sure to set the correct parameters in

Moreover, we provide a fully trained model in for predicting new samples (supported WSI formats are ndpi and svs). To use GBMPredictor, simply initialize by calling gbm_predictor = GBMPredictor() and predict your sample by calling (predicted_TS, risk_group, median_riskscore) = gbm_predictor.predict(*path_to_slidescan*) Heatmaps and detailed results will be automatically saved in a subfolder in your sample path.

Reproducing the manuscript results - step by step guide

Training the CNN model

  1. Clone this repository and install the dependencies in your environment. Make sure that the path for randaugment is correctly set in the (should be correct by default).
  2. Download all included image tiles at and replace the data/training/image_tiles folder with the image_tiles folder from zenodo.
  3. Run and/or to reproduce the training with 5-fold cross validation. Models and results will be saved in the data/models folder.
  4. Run and/or to train the final model on the whole training dataset.

Validate the CNN model on TCGA data

  1. Download scans and clinical data of the TCGA glioblastoma cohort from and/or
  2. Copy tumor segmentations from GBMatch_CNN/data/validation/segmentation into the same folder as the TCGA slide scans
  3. Predict TCGA samples with gbm_predictor (see above). (You can also find all prediction results in GBMatch_CNN/data/validation/TCGA_annotation_prediction.csv.)

Evaluation of the tumor microenvironment

  1. Install qupath 0.3.0 (newer versions should also work):
  2. Download immunohistochemical slides from
  3. Download annotation (IHC_geojsons) from
  4. Create a new project and import all immunohistochemical slides & annotations.
  5. Copy the CD34 and HLA-DR thresholder from GBMatch_CNN/qupath into your project.
  6. Run GBMatch_CNN/qupath/IHC_eval.groovy for all slides - immunohistochemistry results will be saved to a IHC_results-folder.
  7. Create a new project and import all HE image tiles.
  8. Run GBMatch_CNN/qupath/cellularity.groovy for all slides - cellularity results will be saved to a HE-results-folder.

Version History

main @ 9b4911f (earliest) Created 13th May 2024 at 08:10 by Thomas Roetzer-Pejrimovsky


Frozen main 9b4911f
help Creators and Submitter
Roetzer-Pejrimovsky, T. (2024). Training a CNN model for classification of transcriptional subtypes and survival prediction in glioblastoma. WorkflowHub.

Views: 112

Created: 13th May 2024 at 08:10

Annotated Properties
help Attributions


Total size: 379 MB
Powered by
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH