Workflow Type: Common Workflow Language


Sankey plot

VIRify is a recently developed pipeline for the detection, annotation, and taxonomic classification of viral contigs in metagenomic and metatranscriptomic assemblies. The pipeline is part of the repertoire of analysis services offered by MGnify. VIRify’s taxonomic classification relies on the detection of taxon-specific profile hidden Markov models (HMMs), built upon a set of 22,014 orthologous protein domains and referred to as ViPhOGs.

VIRify was implemented in CWL.

What do I need?

The current implementation uses CWL version 1.2 dev+2. It was tested using Toil version 4.10 as the workflow engine and conda to manage the software dependencies.

Docker - Singularity support


Setup environment

conda env create -f cwl/requirements/conda_env.yml
conda activate viral_pipeline

Basic execution

cd cwl/ -h

A note about metatranscriptomes

Although VIRify has been benchmarked and validated with metagenomic data in mind, it is also possible to use this tool to detect RNA viruses in metatranscriptome assemblies (e.g. SARS-CoV-2). However, some additional considerations for this purpose are outlined below:
1. Quality control: As for metagenomic data, a thorough quality control of the FASTQ sequence reads to remove low-quality bases, adapters and host contamination (if appropriate) is required prior to assembly. This is especially important for metatranscriptomes as small errors can further decrease the quality and contiguity of the assembly obtained. We have used TrimGalore for this purpose.

2. Assembly: There are many assemblers available that are appropriate for either metagenomic or single-species transcriptomic data. However, to our knowledge, there is no assembler currently available specifically for metatranscriptomic data. From our preliminary investigations, we have found that transcriptome-specific assemblers (e.g. rnaSPAdes) generate more contiguous and complete metatranscriptome assemblies compared to metagenomic alternatives (e.g. MEGAHIT and metaSPAdes).

3. Post-processing: Metatranscriptomes generate highly fragmented assemblies. Therefore, filtering contigs based on a set minimum length has a substantial impact in the number of contigs processed in VIRify. It has also been observed that the number of false-positive detections of VirFinder (one of the tools included in VIRify) is lower among larger contigs. The choice of a length threshold will depend on the complexity of the sample and the sequencing technology used, but in our experience any contigs <2 kb should be analysed with caution.

4. Classification: The classification module of VIRify depends on the presence of a minimum number and proportion of phylogenetically-informative genes within each contig in order to confidently assign a taxonomic lineage. Therefore, short contigs typically obtained from metatranscriptome assemblies remain generally unclassified. For targeted classification of RNA viruses (for instance, to search for Coronavirus-related sequences), alternative DNA- or protein-based classification methods can be used. Two of the possible options are: (i) using MashMap to screen the VIRify contigs against a database of RNA viruses (e.g. Coronaviridae) or (ii) using hmmsearch to screen the proteins obtained in the VIRify contigs against marker genes of the taxon of interest.

Contact us

MGnify helpdesk

Click and drag the diagram to pan, double click or use the controls to zoom.


ID Name Description Type
input_fasta_file n/a n/a
  • File
virsorter_virome n/a Set this parameter if the input fasta is mostly viral. See:
  • boolean
virsorter_data_dir n/a VirSorter supporting database files.
  • Directory
add_hmms_tsv n/a Additonal metadata tsv
  • File
hmmscan_database_dir n/a HMMScan Viral HMM (databases/vpHMM/vpHMM_database). NOTE: it needs to be a full path.
  • Directory
ncbi_tax_db_file n/a ete3 NCBITaxa db This file was manually built and placed in the corresponding path (on databases)
  • File
img_blast_database_dir n/a Downloaded from:
  • Directory
mashmap_reference_file n/a MashMap Reference file. Use MashMap to
  • File?
pprmeta_simg n/a PPR-Meta singularity simg file
  • File


ID Name Description
fasta_rename Filter contigs n/a
length_filter Filter contigs Default lenght 1kb
virfinder VirFinder n/a
virsorter VirSorter n/a
pprmeta PPR-Meta n/a
parse_pred_contigs Combine n/a
prodigal Prodigal n/a
hmmscan hmmscan n/a
ratio_evalue ratio evalue ViPhOG n/a
annotation ViPhOG annotations n/a
assign Taxonomic assign n/a
krona krona plots n/a
fasta_restore_name_hc Restore fasta names n/a
fasta_restore_name_lc Restore fasta names n/a
fasta_restore_name_pp Restore fasta names n/a
imgvr_blast Blast in a database of viral sequences including metagenomes n/a
mashmap MashMap n/a


ID Name Description Type
filtered_contigs n/a n/a
  • File
virfinder_output n/a n/a
  • File
virsorter_output_fastas n/a n/a
  • File[]
high_confidence_contigs n/a n/a
  • File?
low_confidence_contigs n/a n/a
  • File?
parse_prophages_contigs n/a n/a
  • File?
high_confidence_faa n/a n/a
  • File?
low_confidence_faa n/a n/a
  • File?
prophages_faa n/a n/a
  • File?
taxonomy_assignations n/a n/a
  • array containing
    • File
krona_plots n/a n/a
  • array containing
    • File
krona_plot_all n/a n/a
  • File
blast_results n/a n/a
  • File[]
blast_result_filtereds n/a n/a
  • File[]
blast_merged_tsvs n/a n/a
  • File[]
mashmap_hits n/a n/a
  • array containing
    • File

Version History

v0.4.0 (earliest) Created 8th Jun 2020 at 11:21 by Laura Rodriguez-Navas

No revision comments

Frozen v0.4.0 c714c53
help Creators and Submitter
Additional credit

Martin Hölzer, Alexandre Almeida, Guillermo Rangel-Pineros and Ekaterina Sakharova

Beracochea, M. (2022). VIRify. WorkflowHub.

Views: 4520   Downloads: 270

Created: 8th Jun 2020 at 11:21

Last updated: 8th Mar 2021 at 21:38

help Tags
help Attributions


Total size: 8.27 KB
Powered by
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH