A workflow for marine Genomic Observatories data analysis
eosc-life-gos @ deb5427 (latest)

eosc-life-gos @ deb5427 (latest)

eosc-life-gos @ deb5427

eosc-life-gos @ 28122db (earliest)

View on GitHub

Download RO-Crate

Workflow Type: Common Workflow Language

Stable

metaGOflow: A workflow for marine Genomic Observatories' data analysis

logo

An EOSC-Life project

The workflows developed in the framework of this project are based on pipeline-v5 of the MGnify resource.

This branch is a child of the pipeline_5.1 branch that contains all CWL descriptions of the MGnify pipeline version 5.1.

Dependencies

To run metaGOflow you need to make sure you have the following set on your computing environmnet first:

python3 [v 3.8+]
Docker [v 19.+] or Singularity [v 3.7.+]/Apptainer [v 1.+]
cwltool [v 3.+]
rdflib [v 6.+]
rdflib-jsonld [v 0.6.2]
ro-crate-py [v 0.7.0]
pyyaml [v 6.0]
Node.js [v 10.24.0+]
Available storage ~235GB for databases

Storage while running

Depending on the analysis you are about to run, disk requirements vary. Indicatively, you may have a look at the metaGOflow publication for computing resources used in various cases.

Installation

Get the EOSC-Life marine GOs workflow

git clone https://github.com/emo-bon/MetaGOflow
cd MetaGOflow

Download necessary databases (~235GB)

You can download databases for the EOSC-Life GOs workflow by running the download_dbs.sh script under the Installation folder.

bash Installation/download_dbs.sh -f [Output Directory e.g. ref-dbs]

If you have one or more already in your system, then create a symbolic link pointing at the ref-dbs folder or at one of its subfolders/files.

The final structure of the DB directory should be like the following:

user@server:~/MetaGOflow: ls ref-dbs/
db_kofam/  diamond/  eggnog/  GO-slim/  interproscan-5.57-90.0/  kegg_pathways/  kofam_ko_desc.tsv  Rfam/  silva_lsu/  silva_ssu/

How to run

Ensure that `Node.js` is installed on your system before running metaGOflow

If you have root access on your system, you can run the commands below to install it:

DEBIAN/UBUNTU

sudo apt-get update -y
sudo apt-get install -y nodejs

RH/CentOS

sudo yum install rh-nodejs (e.g. rh-nodejs10)

Set up the environment

Run once - Setup environment

```
conda create -n EOSC-CWL python=3.8
```
```
conda activate EOSC-CWL
```

pip install cwlref-runner cwltool[all] rdflib-jsonld rocrate pyyaml

Run every time

conda activate EOSC-CWL

Run the workflow

Edit the config.yml file to set the parameter values of your choice. For selecting all the steps, then set to true the variables in lines [2-6].

Using Singularity

Standalone

run:

./run_wf.sh -s -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz
``

Using a cluster with a queueing system (e.g. SLURM)

Create a job file (e.g., SBATCH file)
Enable Singularity, e.g. module load Singularity & all other dependencies
Add the run line to the job file

Using Docker

Standalone

run:

./run_wf.sh -n osd-short -d short-test-case -f test_input/wgs-paired-SRR1620013_1.fastq.gz -r test_input/wgs-paired-SRR1620013_2.fastq.gz

HINT: If you are using Docker, you may need to run the above command without the `-s' flag.

Testing samples

The samples are available in the test_input folder.

We provide metaGOflow with partial samples from the Human Metagenome Project (SRR1620013 and SRR1620014) They are partial as only a small part of their sequences have been kept, in terms for the pipeline to test in a fast way.

Hints and tips

In case you are using Docker, it is strongly recommended to avoid installing it through snap.
RuntimeError: slurm currently does not support shared caching, because it does not support cleaning up a worker after the last job finishes. Set the --disableCaching flag if you want to use this batch system.
In case you are having errors like:

cwltool.errors.WorkflowException: Singularity is not available for this tool

You may run the following command:

singularity pull --force --name debian:stable-slim.sif docker://debian:stable-sli

Contribution

To make contribution to the project a bit easier, all the MGnify conditionals and subworkflows under the workflows/ directory that are not used in the metaGOflow framework, have been removed.
However, all the MGnify tools/ and utils/ are available in this repo, even if they are not invoked in the current version of metaGOflow. This way, we hope we encourage people to implement their own conditionals and/or subworkflows by exploiting the currently supported tools and utils as well as by developing new tools and/or utils.

SEEK ID: https://workflowhub.eu/workflows/384?version=3

DOI: 10.48546/workflowhub.workflow.384.3

Version History

eosc-life-gos @ deb5427 (latest) Created 16th May 2023 at 21:41 by Haris Zafeiropoulos

Merge pull request #38 from emo-bon/fix-bugs

Fixes logical expression to keep tmp.

Frozen eosc-life-gos deb5427

eosc-life-gos @ deb5427 Created 16th May 2023 at 21:38 by Haris Zafeiropoulos

Merge pull request #38 from emo-bon/fix-bugs

Fixes logical expression to keep tmp.

Frozen eosc-life-gos deb5427

eosc-life-gos @ 28122db (earliest) Created 19th Sep 2022 at 19:00 by Haris Zafeiropoulos

running version with workaround in conditionals

Frozen eosc-life-gos 28122db

Creators and Submitter

Creators

Submitter

Haris Zafeiropoulos

Citation

Zafeiropoulos, H., & Beracochea, M. (2023). A workflow for marine Genomic Observatories data analysis. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.384.3

License

Apache License 2.0

Activity

Views: 5360 Downloads: 1048

Created: 19th Sep 2022 at 19:00

Last updated: 16th May 2023 at 23:00

A workflow for marine Genomic Observatories data analysis eosc-life-gos @ deb5427 (latest) eosc-life-gos @ deb5427 (latest) eosc-life-gos @ deb5427 eosc-life-gos @ 28122db (earliest)

metaGOflow: A workflow for marine Genomic Observatories' data analysis

An EOSC-Life project

Dependencies

Storage while running

Installation

Get the EOSC-Life marine GOs workflow

Download necessary databases (~235GB)

How to run

Ensure that Node.js is installed on your system before running metaGOflow

DEBIAN/UBUNTU

RH/CentOS

Set up the environment

Run once - Setup environment

Run every time

Run the workflow

Using Singularity

Standalone

Using a cluster with a queueing system (e.g. SLURM)

Using Docker

Standalone

Testing samples

Hints and tips

Contribution

Version History

eosc-life-gos @ deb5427 (latest) Created 16th May 2023 at 21:41 by Haris Zafeiropoulos

eosc-life-gos @ deb5427 Created 16th May 2023 at 21:38 by Haris Zafeiropoulos

eosc-life-gos @ 28122db (earliest) Created 19th Sep 2022 at 19:00 by Haris Zafeiropoulos

Creators

Submitter

Related items

A workflow for marine Genomic Observatories data analysis
eosc-life-gos @ deb5427 (latest)

eosc-life-gos @ deb5427 (latest)

eosc-life-gos @ deb5427

eosc-life-gos @ 28122db (earliest)

Ensure that `Node.js` is installed on your system before running metaGOflow