ARG-Sniper
A Nextflow pipeline for antibiotic resistance gene detection from paired-end sequencing reads.
Introduction
ARG-Sniper is a Nextflow DSL-2 pipeline designed for metagenomic analysis that processes paired-end FASTQ files to detect antibiotic resistance genes using multiple bioinformatics tools. The pipeline runs five different analysis tools in parallel: GROOT, ARIBA, KMA (adopted from ARGprofiler), KARGA, and SRST2, each requiring their respective databases. Users can selectively skip any of the five tools using command-line flags (--skip_groot, --skip_ariba, etc.), allowing for customized analysis workflows. The pipeline takes FASTQ and processes them through the selected tools. After individual tool execution, the pipeline collects all results and generates a summary report that consolidates findings from each analysis. The workflow outputs separate directories for each tool's results along with a final summary directory containing the integrated analysis.
Note: This pipeline focuses on detecting antibiotic resistance genes and does not report SNP-based resistance mechanisms.
How-2-Run
Before running the pipeline make sure all the required databases and tool-dependencies were met.
Software Requirements
- Nextflow (≥22.04.0) with DSL-2 support
- Singularity container runtime
Bioinformatics Tools (via Singularity containers)
- SRST2 v2.0.0 - Short Read Sequence Typing
- GROOT v1.1.2 - Graph-based resistance gene detection
- ARIBA v2.14.6 - Antimicrobial Resistance Identification
- KARGA v1.02 - K-mer based resistance gene analysis
- KMA v1.4.9 - K-mer alignment tool (used by ARGprofiler)
Required Databases
All tools require pre-built databases from the panARG v2 collection:
- grootdb (indexed database)
- aribadb (prepared database)
- srst2db (FASTA sequences)
- kargadb (FASTA sequences)
- argprofilerdb (KMA indexed database)
- panARG annotations (TSV metadata file)
System Requirements
- CPU: 8 cores (default)
- Memory: 16 GB RAM (default)
- Scheduler: SLURM (for HPC execution)
Usage
Run --help to see available options:
nextflow run ARG-sniper/main.nf --help
Usage:
nextflow run ARG-Sniper-pipeline.nf --offline -with-report
Required Arguments:
Input:
--reads Folder containing reads with file name *_R{1,2}.fastq.gz
--gootdb Path of indexed GROOT database
--aribadb Path to ARIBA database
--kargadb Path to KARG database
--srst2db Path to SRST2 database
--argprofilerdb Path to ARGprofiler database
--output Folder for output files
# By default, the pipeline will run all supported tools.
Optional Arguments:
Skipping specific tools:
--skip_groot Skip running GROOT
--skip_kma Skip running KMA
--skip_ariba Skip running ARIBA
--skip_karga Skip running KARGA
--skip_srst2 Skip running SRST2
Expected Output
Upon successful execution with all tools, ARG-Sniper generates the following directory structure with results for each sample:
results/
├── argprofiler_results/
│ └── ARGprofiler_report_{sample}.txt
├── ariba_results/
│ ├── ariba_report_{sample}.tsv
│ └── ariba_summary_{sample}.csv
├── groot_results/
│ └── groot_report_{sample}.tsv
├── karga_results/
│ └── karga_report_{sample}.csv
├── srst2_results/
│ └── srst2_report_{sample}_fullgenes_sequence_results.txt
└── summary/
└── summary_{sample}.tsv
Example output for multiple samples:
results/
├── argprofiler_results/
│ ├── ARGprofiler_report_dataset-100x-depth.txt
│ ├── ARGprofiler_report_dataset-90x-depth.txt
│ └── ARGprofiler_report_dataset-95x-depth.txt
├── ariba_results/
│ ├── ariba_report_dataset-100x-depth.tsv
│ ├── ariba_report_dataset-90x-depth.tsv
│ ├── ariba_report_dataset-95x-depth.tsv
│ ├── ariba_summary_dataset-100x-depth.csv
│ ├── ariba_summary_dataset-90x-depth.csv
│ └── ariba_summary_dataset-95x-depth.csv
├── groot_results/
│ ├── groot_report_dataset-100x-depth.tsv
│ ├── groot_report_dataset-90x-depth.tsv
│ └── groot_report_dataset-95x-depth.tsv
├── karga_results/
│ ├── karga_report_dataset-100x-depth.csv
│ ├── karga_report_dataset-90x-depth.csv
│ └── karga_report_dataset-95x-depth.csv
├── srst2_results/
│ ├── srst2_report_dataset-100x-depth_fullgenes_sequence_results.txt
│ ├── srst2_report_dataset-90x-depth_fullgenes_sequence_results.txt
│ └── srst2_report_dataset-95x-depth_fullgenes_sequence_results.txt
└── summary/
├── summary_dataset-100x-depth.tsv
├── summary_dataset-90x-depth.tsv
└── summary_dataset-95x-depth.tsv
The summary/ directory contains consolidated results from all tools for each sample.
Version History
v1.0.1 (earliest) Created 5th Feb 2026 at 16:34 by Sumeet Tiwari
Add subsample_genomes.py for genome processing
This script processes genome files, merges them with metadata, samples strains based on AMR counts, and generates coverage and combined FASTA outputs.
Frozen
v1.0.1
9d04754
Creators and SubmitterCreators
Submitter
Views: 686 Downloads: 87
Created: 5th Feb 2026 at 16:34
AttributionsNone
View on GitHub
https://orcid.org/0000-0002-3366-1812