ONTViSc (ONT-based Viral Screening for Biosecurity)
main @ d333445

Workflow Type: Nextflow
Work-in-progress

ONTViSc (ONT-based Viral Screening for Biosecurity)

Introduction

eresearchqut/ontvisc is a Nextflow-based bioinformatics pipeline designed to help diagnostics of viruses and viroid pathogens for biosecurity. It takes fastq files generated from either amplicon or whole-genome sequencing using Oxford Nanopore Technologies as input.

The pipeline can either: 1) perform a direct search on the sequenced reads, 2) generate clusters, 3) assemble the reads to generate longer contigs or 4) directly map reads to a known reference.

The reads can optionally be filtered from a plant host before performing downstream analysis.

Pipeline overview

  • Data quality check (QC) and preprocessing
    • Merge fastq files (optional)
    • Raw fastq file QC (Nanoplot)
    • Trim adaptors (PoreChop ABI - optional)
    • Filter reads based on length and/or quality (Chopper - optional)
    • Reformat fastq files so read names are trimmed after the first whitespace (bbmap)
    • Processed fastq file QC (if PoreChop and/or Chopper is run) (Nanoplot)
  • Host read filtering
    • Align reads to host reference provided (Minimap2)
    • Extract reads that do not align for downstream analysis (seqtk)
  • QC report
    • Derive read counts recovered pre and post data processing and post host filtering
  • Read classification analysis mode
  • Clustering mode
    • Read clustering (Rattle)
    • Convert fastq to fasta format (seqtk)
    • Cluster scaffolding (Cap3)
    • Megablast homology search against ncbi or custom database (blast)
    • Derive top candidate viral hits
  • De novo assembly mode
    • De novo assembly (Canu or Flye)
    • Megablast homology search against ncbi or custom database or reference (blast)
    • Derive top candidate viral hits
  • Read classification mode
    • Option 1 Nucleotide-based taxonomic classification of reads (Kraken2, Braken)
    • Option 2 Protein-based taxonomic classification of reads (Kaiju, Krona)
    • Option 3 Convert fastq to fasta format (seqtk) and perform direct homology search using megablast (blast)
  • Map to reference mode
    • Align reads to reference fasta file (Minimap2) and derive bam file and alignment statistics (Samtools)

Detailed instructions can be found on GitHub. A step-by-step guide with instructions on how to set up and execute the ONTvisc pipeline on one of the HPC systems: Lyra (Queensland University of Technology), Setonix (Pawsey) and Gadi (National Computational Infrastructure) can be found here.

Authors

Marie-Emilie Gauthier
Craig Windell
Magdalena Antczak
Roberto Barrero

Version History

main @ d333445 (earliest) Created 4th Dec 2023 at 01:42 by Magdalena Antczak

update conditions in preprocessing steps


Frozen main d333445
help Creators and Submitter
Creators
  • Marie-Emilie Gauthier
  • Craig Windell
  • Magdalena Antczak
  • Roberto Barrero
Submitter
Citation
Gauthier, M.-E., Windell, C., Antczak, M., & Barrero, R. (2023). ONTViSc (ONT-based Viral Screening for Biosecurity). WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.683.1
Activity

Views: 2044   Downloads: 445

Created: 4th Dec 2023 at 01:42

Last updated: 19th Feb 2024 at 05:24

help Attributions

None

Total size: 6.65 MB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH