Workflows

What is a Workflow?
36 Workflows visible to you, out of a total of 36

Combined workflow for large genome assembly

The tutorial document for this workflow is here: https://doi.org/10.5281/zenodo.5655813

What it does: A workflow for genome assembly, containing subworkflows:

  • Data QC
  • Kmer counting
  • Trim and filter reads
  • Assembly with Flye
  • Assembly polishing
  • Assess genome quality

Inputs:

  • long reads and short reads in fastq format
  • reference genome for Quast

Outputs:

  • Data information - QC, kmers
  • Filtered, trimmed reads
  • Genome assembly, assembly graph, ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.230.1

Assess genome quality; can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Assesses the quality of the genome assembly: generate some statistics and determine if expected genes are present; align contigs to a reference genome.
  • Inputs: polished assembly; reference_genome.fasta (e.g. of a closely-related species, if available).
  • Outputs: Busco table of genes found; Quast HTML report, and link to Icarus contigs browser, showing contigs aligned to a reference ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.229.1

Assembly polishing subworkflow: Racon polishing with long reads

Inputs: long reads and assembly contigs

Workflow steps:

  • minimap2 : long reads are mapped to assembly => overlaps.paf.
  • overaps, long reads, assembly => Racon => polished assembly 1
  • using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
  • using polished assembly 2 as input, repeat minimap2 + racon => polished assembly 3
  • using polished assembly 3 as input, repeat minimap2 + racon => ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.227.1

Assembly with Flye; can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Assembles long reads with the tool Flye
  • Inputs: long reads (may be raw, or filtered, and/or corrected); fastq.gz format
  • Outputs: Flye assembly fasta; Fasta stats on assembly.fasta; Assembly graph image from Bandage; Bar chart of contig sizes; Quast reports of genome assembly
  • Tools used: Flye, Fasta statistics, Bandage, Bar chart, Quast
  • Input parameters: None required, but recommend ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.225.1

Trim and filter reads; can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Trims and filters raw sequence reads according to specified settings.
  • Inputs: Long reads (format fastq); Short reads R1 and R2 (format fastq)
  • Outputs: Trimmed and filtered reads: fastp_filtered_long_reads.fastq.gz (But note: no trimming or filtering is on by default), fastp_filtered_R1.fastq.gz, fastp_filtered_R2.fastq.gz
  • Reports: fastp report on long reads, html; fastp report ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.224.1

Kmer counting step, can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Estimates genome size and heterozygosity based on counts of kmers
  • Inputs: One set of short reads: e.g. R1.fq.gz
  • Outputs: GenomeScope graphs
  • Tools used: Meryl, GenomeScope
  • Input parameters: None required
  • Workflow steps: The tool meryl counts kmers in the input reads (k=21), then converts this into a histogram. GenomeScope: runs a model on the histogram; reports estimates. k-mer ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.223.1

Data QC step, can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Reports statistics from sequencing reads.
  • Inputs: long reads (fastq.gz format), short reads (R1 and R2) (fastq.gz format).
  • Outputs: For long reads: a nanoplot report (the HTML report summarizes all the information). For short reads: a MultiQC report.
  • Tools used: Nanoplot, FastQC, MultiQC.
  • Input parameters: None required.
  • Workflow steps: Long reads are analysed by Nanoplot; Short reads ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.222.1

Assembly polishing subworkflow: Racon polishing with short reads

Inputs: short reads and assembly (usually pre-polished with other tools first, e.g. Racon + long reads; Medaka)

Workflow steps:

  • minimap2: short reads (R1 only) are mapped to the assembly => overlaps.paf. Minimap2 setting is for short reads.
  • overlaps + short reads + assembly => Racon => polished assembly 1
  • using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
  • Racon short-read polished ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.228.1

Assembly polishing; can run alone or as part of a combined workflow for large genome assembly.

  • What it does: Polishes (corrects) an assembly, using long reads (with the tools Racon and Medaka) and short reads (with the tool Racon). (Note: medaka is only for nanopore reads, not PacBio reads).
  • Inputs: assembly to be polished: assembly.fasta; long reads - the same set used in the assembly (e.g. may be raw or filtered) fastq.gz format; short reads, R1 only, in fastq.gz format
  • Outputs: ...

Type: Galaxy

Creator: Anna Syme

Submitter: Anna Syme

DOI: 10.48546/workflowhub.workflow.226.1

Work-in-progress

Germline-ShortV @ NCI-Gadi is an implementation of the BROAD Institute's best practice workflow for germline short variant discovery. This implementation is optimised for the National Compute Infrastucture's Gadi HPC, utilising scatter-gather parallelism to enable use of multiple nodes with high CPU or memory efficiency. This workflow requires sample BAM files, which can be generated using the Fastq-to-bam @ NCI-Gadi pipeline. Germline-ShortV can be applied ...

Type: Shell Script

Creators: Rosemarie Sadsad, Georgina Samaha, Tracy Chew, Cali Willet

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.143.1

Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH