ReMM score
master @ b1fbe1e

Workflow Type: Snakemake

The Regulatory Mendelian Mutation (ReMM) score was created for relevance prediction of non-coding variations (SNVs and small InDels) in the human genome (GRCh37) in terms of Mendelian diseases. This project updates the ReMM score for the genome build GRCh38 and combines GRCh37 and GRCh38 into one workflow.



We use Conda as software and dependency management tool. Conda installation guidelines can be found here:

Additional programs

These programs are used during the workflow. They usually need to be compiled, however, the repository already contains the executables or generated files.


The workflow is managed by Snakemake - a workflow management system used to create reproducible and scalable data analyses. To install Snakemake as well as all other required packages, you need to create a working environment according to the description in the file env/ReMM.yaml. For that, first

Clone the repository

git clone
cd ReMM

Create a working environment and activate it

conda env create -n ReMM --file workflow/envs/ReMM.yaml
conda activate ReMM

All paths are relative to the Snakemake file so you do not need to change any path variables. Additionally, Snakemake creates all missing directories, so no need to create any aditional folders either.


The workflow consists of four main parts:

  • Download of feature data
  • Data processing and cleaning
  • Model training and validation
  • Calculation of ReMM for the whole genome

The workflow folder contains a graph of the workflow and more detailed information on the most important steps.

To launch a snakemake workflow, you need to tell snakemake which file you want to generate. We defined all rules for multiple steps. They can be found here: workflow/Snakefile. For example, you want to generate all feature sets defined in a config file you can run:

snakemake -c1 all_feature_sets

To execute any step separately (see in the workflow folder for details on workflow steps), you need to look up the name of the desired output file in the scripts and call Snakemake with the exact name. Using a flag -n, you can initiate a 'dry run': Snakemake will check the consistency of all rules and files and show the number of steps. However, a clean dry run does not necessarily mean that no errors will occur during a normal run. ReMM score is not allele-specific so that you get only one score independent of the variant itself. The workflow from the download of data up to computing the scores may take several days or weeks depending on the computing power and internet connection.

The config files

The main config file can be found in config/config.yaml. This config file was used to generate the ReMM score. Here most of the configuration magic happens. There is a second config file config/features.yaml where all features are listed (with additional description). Config files are controled via json-schema.

We also provide a slurm config file for runtimes, memory and number of threads per rule: config/slurm.yaml.

Version History

master @ b1fbe1e (earliest) Created 3rd Jan 2023 at 09:09 by Max Schubach


Frozen master b1fbe1e
help Creators and Submitter
  • Max Schubach
Schubach, M. (2023). ReMM score. WorkflowHub.

Views: 1120   Downloads: 114

Created: 3rd Jan 2023 at 09:09

help Attributions


Total size: 49.9 MB
Powered by
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH