Workflows
What is a Workflow?Filters
This workflow perform the scaffolding of a genome assemble using HiC data with YAHS. Part of the VGP set of workflows.
This workflow generates Hi-C contact maps for genome assemblies in the Pretext format. It is compatible with one or 2 haplotypes. It includes tracks for PacBio read coverage, Gaps, and telomeres. The Pretext files can be open in PretextView for the manual curation of genome assemblies.
Genome Assembly with Hifi reads and Trio Data
Generate phased assembly based on PacBio Hifi Reads using parental Illumina data for phasing. Part of the VGP workflow suite, it needs to be run after the Trio k-mer Profiling workflow VGP2.
Inputs
- Hifi long reads [fastq]
- Concatenated Illumina reads : Paternal [fastq]
- Concatenated Illumina reads : Maternal [fastq]
- K-mer database [meryldb] generated by VGP2 workflow.
- Paternal hapmer database [meryldb] generated by VGP2 workflow.
...
Purge contigs marked as duplicates by purge_dups (could be haplotypic duplication or overlap duplication). This workflow is the 6th workflow of the VGP pipeline. It is meant to be run after one of the contigging steps (Workflow 3, 4, or 5)
Purge duplicates from one haplotype. Prerequisites: run after a k-mer profiling workflow (VGP 1 or 2) and a contiging workflow (VGP 3,4 or 5).
Contiging Solo:
Generate assembly based on PacBio Hifi Reads.
Inputs
- Hifi long reads [fastq]
- K-mer database [meryldb]
- Genome profile summary generated by Genomescope [txt]
- Homozygous Read Coverage. Optional, use if you think the estimation from Genomescope is inacurate.
- Genomescope Model Parameters generated by Genomescope [tabular]
- Database for busco lineage (recommended: latest)
- Busco lineage (recommended: vertebrata)
- Name of first assembly
- Name of second ...
Generate Nx and Size plot for multiple assemblies
Inputs
Collection of fasta files. The name of each item in the collection will be used as label for the Nx and Size plots.
Outputs
- Nx plot
- Size plot
Decontamination Workflow
Decontamination (foreign contaminants and mitochondrial sequences) of genome assembly after scaffolding step. Part of the VGP Suite.
Inputs
- Genome Assembly [fasta]
- Database for Kraken2. Database containing the possible contaminants.
Ouput
- List of contaminant scaffolds
- List of mitochondrial scaffolds
- Decontaminated assembly
Scaffolding with Bionano
Scaffolding using Bionano optical map data
Inputs
- Bionano data [cmap]
- Estimated genome size [txt]
- Phased assembly generated by Hifiasm [gfa1]
Outputs
- Scaffolds
- Non-scaffolded contigs
- QC: Assembly statistics
- QC: Nx plot
- QC: Size plot