Introduction
samba-norovirus is an adaptation of the samba workflow for the specific needs in metabarcoding analyses of norovirus. It is a FAIR scalable workflow integrating, into a unique tool, state-of-the-art bioinformatics and statistical methods to conduct reproducible metabarcoding and eDNA analyses using Nextflow (Di Tommaso et al., 2017). SAMBA performs complete metabarcoding analysis by:
- processing data using commonly used procedure with QIIME 2 (version 2024.2 ; Bolyen et al., 2019):
- remove primers from raw reads and remove reads without detected primer using the QIIME 2 plugin of cutadapt:
q2-cutadapt
- denoise reads and infering ASV using the QIIME 2 plugin of [DADA2](Callahan et al., 2016):
q2-dada2
- cluster ASV by small local linking threshold using swarm (Mahé et al., 2022)
- detect and remove chimeras using UCHIME (Edgar et al., 2011)
- assign the ASV taxonomy using the Naive Bayesian classification from the QIIME 2 plugin
q2-feature-classifier
- remove primers from raw reads and remove reads without detected primer using the QIIME 2 plugin of cutadapt:
- post-process ASV table with different opitonal processes:
- remove contaminant from biological samples using positive and/or negative control samples with the R package: microDecon (McKnight et al., 2019)
- removal of ASVs belonging to undesired taxa using the QIIME 2 plugin
q2-taxa
(filter-table & filter-seqs) - removal of ASVs based on their frequency, contingency and heir length using the QIIME 2 plugin
q2-feature-table
(filter-table, filter-seqs & filter-features)
- conducting extended statistical and ecological analyses using homemade Rscript
The samba-norovirus pipeline can run tasks across multiple compute infrastructures in a very portable manner. It comes with singularity containers making installation trivial and results highly reproducible.
Requirements
i. You must have Nextflow (≥ v24.04.4)
installed on your computing machine to run the workflow.
ii. You must have Singularity (≥ v3.6.4)
installed on your computing machine for full pipeline reproducibility.
iii. If your HPC nodes don't have any internet access. Please download before any workflow run the singularity images available on the samba-norovirus singularity image repository. Then set the $NXF_SINGULARITY_CACHEDIR
environment variable to the path where you just downloaded the images.
iv. You must download the norovirus database and sequences formatted for the workflow in order to perform the taxonomic assignment of your ASVs and the chimeras detection. You can download all files on the samba-norovirus database repository. Then set the database
and uchime_ref
parameters in the base.config file with the path where you placed the downloaded files.
Quick Start
i. Download the pipeline
git clone https://gitlab.ifremer.fr/bioinfo/workflows/samba-norovirus
To use SAMBA-norovirus on a computing cluster, it is necessary to provide a configuration file for your system. For some institutes, this one already exists and is referenced on nf-core/configs. If so, you can simply download your institute custom config file and simply use
-c
in your command. This will set the appropriate execution settings for your local compute environment.
ii. Start running your own analysis!
Before you start analyzing your data, please read the SAMBA workflow documentation
nextflow run main.nf -profile norovirus,singularity [-c ]
Credits
samba-norovirus is written by Cyril Noël from the SeBiMER, the Bioinformatics Core Facility of IFREMER. This workflow was developed in close collaboration with members of the Ifremer LSEM lab.
Contributions
We welcome contributions to the pipeline. If such case you can do one of the following:
- Use issues to submit your questions
- Fork the project, do your developments and submit a pull request
- Contact us (see email below)
Version History
Version 1 (earliest) Created 10th Jun 2025 at 07:18 by Cyril Noel
Initial commit
Frozen
Version-1
3b15d71

Creators
Submitter
Views: 44 Downloads: 7
Created: 10th Jun 2025 at 07:18

None