Combined workflow for large genome assembly
The tutorial document for this workflow is here: https://doi.org/10.5281/zenodo.5655813
What it does: A workflow for genome assembly, containing subworkflows:
- Data QC
- Kmer counting
- Trim and filter reads
- Assembly with Flye
- Assembly polishing
- Assess genome quality
- long reads and short reads in fastq format
- reference genome for Quast
- Data information - QC, kmers
- Filtered, trimmed reads
- Genome assembly, assembly graph, ...
Assess genome quality; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Assesses the quality of the genome assembly: generate some statistics and determine if expected genes are present; align contigs to a reference genome.
- Inputs: polished assembly; reference_genome.fasta (e.g. of a closely-related species, if available).
- Outputs: Busco table of genes found; Quast HTML report, and link to Icarus contigs browser, showing contigs aligned to a reference ...
Assembly polishing subworkflow: Racon polishing with long reads
Inputs: long reads and assembly contigs
- minimap2 : long reads are mapped to assembly => overlaps.paf.
- overaps, long reads, assembly => Racon => polished assembly 1
- using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
- using polished assembly 2 as input, repeat minimap2 + racon => polished assembly 3
- using polished assembly 3 as input, repeat minimap2 + racon => ...
Assembly with Flye; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Assembles long reads with the tool Flye
- Inputs: long reads (may be raw, or filtered, and/or corrected); fastq.gz format
- Outputs: Flye assembly fasta; Fasta stats on assembly.fasta; Assembly graph image from Bandage; Bar chart of contig sizes; Quast reports of genome assembly
- Tools used: Flye, Fasta statistics, Bandage, Bar chart, Quast
- Input parameters: None required, but recommend ...