A workflow for marine Genomic Observatories data analysis
An EOSC-Life project
The workflows developed in the framework of this project are based on
pipeline-v5 of the MGnify resource.
This branch is a child of the
pipeline_5.1branch that contains all CWL descriptions of the MGnify pipeline version 5.1.
The following comes from the initial repo and describes how to get the databases required.
This repository contains all CWL descriptions of the MGnify pipeline version 5.0.
For a thorough read-the-docs, click here.
We kindly recommend use the MGnify resource for data processing.
If you want to run pipeline locally, we recommend you use our pre-build docker containers.
Requirements to run pipeline
python3 [v 3.6+]
docker [v 19.+] or singularity
cwltool [v 3.+] or toil [v 4.2+]
hdd for databases ~133G
All the tools are containerized.
Unfortunately, antiSMASH and InterProScan containers are very big. We provide two options:
Pre-install these tools. The instructions on how to setup the environment are here.
Use containers. First of all you need to uncomment hints in InterProScan-v5.cwl and antismash_v4.cwl. Pre-pull containers from https://hub.docker.com/u/microbiomeinformatics
docker pull microbiomeinformatics/pipeline-v5.interproscan:v5.36-75.0 docker pull microbiomeinformatics/pipeline-v5.antismash:v4.2.0
Get the EOSC-Life marine GOs workflow
git clone https://github.com/EBI-Metagenomics/pipeline-v5.git cd pipeline-v5
Download necessary dbs
You can download databases for the EOSC-Life GOs workflow by running the
If you have one or more already in your system, then create a symbolic link pointing
How to run
activate the conda env
gos_wf.ymlfile to set the parameter values of your choice
In case you are working in a HPC with Singularity, enable Singularity
./run_wf.sh -n false -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz
In case you are using Docker, it is strongly recommended to avoid installing it through
RuntimeError: slurm currently does not support shared caching, because it does not support cleaning up a worker after the last job finishes.
--disableCaching flag if you want to use this batch system.