Introduction metaBIOMx
The metagenomics microbiomics pipeline is a best-practice suite for the decontamination and annotation of sequencing data obtained via short-read shotgun sequencing. The pipeline contains NF-core modules and other local modules that are in the similar format. It can be runned via both docker and singularity containers.
Pipeline summary
The pipeline is able to perform different taxonomic annotation on either (single/paired) reads or contigs. The different subworkflows can be defined via --bypass
flags, a full overview is shown by running --help
. By default the pipeline will check if the right databases are present in the right formats, when the path is provided. If this is not the case, compatible databases will be automatically downloaded.
For both subworkflows the pipeline will perform read trimming via Trimmomatic and/or AdapterRemoval, followed by human removal via Kneaddata. Before and after each step the quality control will be assessed via fastqc and a multiqc report is created as output. Then taxonomy annotation is done as follows:
Read annotation
- paired reads are interleaved using BBTools.
- MetaPhlAn3 and HUMAnN3 are used for taxonomy and functional profiling.
- taxonomy profiles are merged into a single BIOM file using biom-format.
Contig annotation
- read assembly is performed via SPAdes.
- Quality assesment of contigs is done via Busco.
- taxonomy profiles are created using CAT.
- Read abundance estimation is performed on the contigs using Bowtie2 and BCFtools.
- Contigs are selected if a read can be aligned against a contig and a BIOM file is generated using biom-format.
Installation
[!NOTE] Make sure you have installed the latest nextflow version!
Clone the repository in a directory of your choice:
git clone https://github.com/CMG-GUTS/metabiomx.git
The pipeline is containerised, meaning it can be runned via docker or singularity images. No further actions need to be performed when using the docker profile, except a docker registery needs to be set on your local system, see docker. In case singularity is used, please specify the singularity.cacheDir
in the nextflow.config
so that singularity images are saved there and re-used again.
Usage
Since the latest version, metaBIOMx works with both a samplesheet (CSV) format or a path to the input files. Preferably, samplesheets should be provided.
nextflow run main.nf --input samplesheet.csv -work-dir work -profile singularity
nextflow run main.nf --input '*_{1,R1,2,R2}.{fq,fq.gz,fastq,fastq.gz}' -work-dir work -profile singularity
Automatic database setup
The pipeline requires a set of databases which are used by the different tools within this workflow. The user is required to specify the location in where the databases will be downloaded. It is also possible to download the databases manually. The configure
subworkflow will evaluate the database format and presence of the compatible files automatically.
nextflow run main.nf \
--bowtie_db path/to/db/bowtie2 \
--metaphlan_db path/to/db/metaphlan \
--humann_db path/to/db/humann \
--cat_pack_db path/to/db/catpack \
--busco_db path/to/db/busco_downloads \
-work-dir \
-profile
Manual database setup
HUMAnN3 and MetaPhlan3 DB
Make sure the path/to/db/humann
should contain a chocophlan
, uniref
and utility_mapping
directory. These can be obtained by the following command:
docker pull biobakery/humann:latest
docker run --rm -v $(pwd):/scripts biobakery/humann:latest \
humann_databases --download chocophlan full ./path/to/db/humann \
&& humann_databases --download uniref uniref90_diamond ./path/to/db/humann \
&& humann_databases --download utility_mapping full ./path/to/db/humann
MetaPhlAn DB
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/mpa_vJun23_CHOCOPhlAnSGB_202403.tar \
&& tar -xvf mpa_vJun23_CHOCOPhlAnSGB_202403.tar -C path/to/db/metaphlan \
&& rm mpa_vJun23_CHOCOPhlAnSGB_202403.tar
wget http://cmprod1.cibio.unitn.it/biobakery4/metaphlan_databases/bowtie2_indexes/mpa_vJun23_CHOCOPhlAnSGB_202403_bt2.tar \
&& tar -xvf mpa_vJun23_CHOCOPhlAnSGB_202403_bt2.tar -C path/to/db/metaphlan \
&& rm mpa_vJun23_CHOCOPhlAnSGB_202403_bt2.tar
echo 'mpa_vJun23_CHOCOPhlAnSGB_202403' > path/to/db/metaphlan/mpa_latest
Kneaddata DB
docker pull agusinac/kneaddata:latest
docker run --rm -v $(pwd):/scripts agusinac/kneaddata:latest \
kneaddata_database \
--download human_genome bowtie2 ./path/to/db/bowtie2
CAT_pack DB
A pre-constructed diamond database can be downloaded manually or by command:
docker pull agusinac/catpack:latest
docker run --rm -v $(pwd):/scripts agusinac/catpack:latest \
CAT_pack download \
--db nr \
-o path/to/db/catpack
busco DB
BUSCO expects that the directory is called busco_downloads
.
docker pull ezlabgva/busco:v5.8.2_cv1
docker run --rm -v $(pwd):/scripts ezlabgva/busco:v5.8.2_cv1 \
busco \
--download bacteria_odb12 \
--download_path path/to/db/busco_downloads
Support
If you are having issues, please create an issue
Version History
main @ 240bd6c (latest) Created 4th Jul 2025 at 17:08 by Alem Gusinac
fixed crucial metaphlan download issue in configure
Frozen
main
240bd6c
main @ dd5cf35 (earliest) Created 3rd Jul 2025 at 15:46 by Alem Gusinac
Frozen
main
dd5cf35

Creators
Submitter
Views: 47 Downloads: 1
Created: 3rd Jul 2025 at 15:46
Last updated: 4th Jul 2025 at 17:22

None