Items related to Anna Syme
The Australian BioCommons enhances digital life science research through world class collaborative distributed infrastructure. It aims to ensure that Australian life science research remains globally competitive, through sustained strategic leadership, research community engagement, digital service provision, training and support.
Teams: Australian BioCommons, QCIF Bioinformatics, Pawsey Supercomputing Research Centre, Sydney Informatics Hub, Janis, Melbourne Data Analytics Platform (MDAP), Galaxy Australia
Web page: https://www.biocommons.org.au/
The Australian BioCommons enhances digital life science research through world class collaborative distributed infrastructure. It aims to ensure that Australian life science research remains globally competitive, through sustained strategic leadership, research community engagement, digital service provision, training and support.
Space: Australian BioCommons
Public web page: https://www.biocommons.org.au/
Organisms: Not specified
Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biological research.
- Accessible: Users can easily run tools without writing code or using the CLI; all via a user-friendly web interface.
- Reproducible: Galaxy captures all the metadata from an analysis, making it completely reproducible.
- Transparent: Users share and publish analyses via interactive pages that can enhance analyses with user annotations.
- Scalable: Galaxy ...
Space: Australian BioCommons
Public web page: https://usegalaxy.org.au/
Organisms: Not specified
workflow-partial-gstacks-populations
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
This workflow is part of the reference-guided stacks workflow, https://workflowhub.eu/workflows/347
This workflow takes in bam files and a population map.
To generate bam files see: https://workflowhub.eu/workflows/351
workflow-partial-bwa-mem
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
This workflow is part of the reference-guided stacks workflow, https://workflowhub.eu/workflows/347
Inputs
- demultiplexed reads in fastq format, may be output from the QC workflow. Files are in a collection.
- reference genome in fasta format ...
workflow-partial-cstacks-sstacks-gstacks
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
This workflow takes in ustacks output, and runs cstacks, sstacks and gstacks.
To generate ustacks output see https://workflowhub.eu/workflows/349
For the full de novo workflow see https://workflowhub.eu/workflows/348
workflow-partial-ustacks-only
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
For the full de novo workflow see https://workflowhub.eu/workflows/348
You may want to run ustacks with different batches of samples.
- To be able to combine these later, there are some necessary steps - we need to keep track of how many ...
workflow-denovo-stacks
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
Inputs
- demultiplexed reads in fastq format, may be output from the QC workflow. Files are in a collection.
- population map in text format
Steps and outputs
ustacks:
- input reads go to ustacks.
- ustacks assembles the reads into matching ...
workflow-ref-guided-stacks
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
Inputs
- demultiplexed reads in fastq format, may be output from the QC workflow. Files are in a collection.
- population map in text format
- reference genome in fasta format
Steps and outputs
BWA MEM 2:
- The reads are mapped to the ...
workflow-qc-of-radseq-reads
These workflows are part of a set designed to work for RAD-seq data on the Galaxy platform, using the tools from the Stacks program.
Galaxy Australia: https://usegalaxy.org.au/
Stacks: http://catchenlab.life.illinois.edu/stacks/
Inputs
- demultiplexed reads in fastq format, in a collection
- two adapter sequences in fasta format, for input into cutadapt
Steps and outputs
The workflow can be modified to suit your own parameters.
The workflow steps are:
- Run ...
Combined workflow for large genome assembly
The tutorial document for this workflow is here: https://doi.org/10.5281/zenodo.5655813
What it does: A workflow for genome assembly, containing subworkflows:
- Data QC
- Kmer counting
- Trim and filter reads
- Assembly with Flye
- Assembly polishing
- Assess genome quality
Inputs:
- long reads and short reads in fastq format
- reference genome for Quast
Outputs:
- Data information - QC, kmers
- Filtered, trimmed reads
- Genome assembly, assembly graph, ...
Assess genome quality; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Assesses the quality of the genome assembly: generate some statistics and determine if expected genes are present; align contigs to a reference genome.
- Inputs: polished assembly; reference_genome.fasta (e.g. of a closely-related species, if available).
- Outputs: Busco table of genes found; Quast HTML report, and link to Icarus contigs browser, showing contigs aligned to a reference ...
Assembly polishing subworkflow: Racon polishing with long reads
Inputs: long reads and assembly contigs
Workflow steps:
- minimap2 : long reads are mapped to assembly => overlaps.paf.
- overaps, long reads, assembly => Racon => polished assembly 1
- using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
- using polished assembly 2 as input, repeat minimap2 + racon => polished assembly 3
- using polished assembly 3 as input, repeat minimap2 + racon => ...
Assembly with Flye; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Assembles long reads with the tool Flye
- Inputs: long reads (may be raw, or filtered, and/or corrected); fastq.gz format
- Outputs: Flye assembly fasta; Fasta stats on assembly.fasta; Assembly graph image from Bandage; Bar chart of contig sizes; Quast reports of genome assembly
- Tools used: Flye, Fasta statistics, Bandage, Bar chart, Quast
- Input parameters: None required, but recommend ...
Trim and filter reads; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Trims and filters raw sequence reads according to specified settings.
- Inputs: Long reads (format fastq); Short reads R1 and R2 (format fastq)
- Outputs: Trimmed and filtered reads: fastp_filtered_long_reads.fastq.gz (But note: no trimming or filtering is on by default), fastp_filtered_R1.fastq.gz, fastp_filtered_R2.fastq.gz
- Reports: fastp report on long reads, html; fastp report ...
Kmer counting step, can run alone or as part of a combined workflow for large genome assembly.
- What it does: Estimates genome size and heterozygosity based on counts of kmers
- Inputs: One set of short reads: e.g. R1.fq.gz
- Outputs: GenomeScope graphs
- Tools used: Meryl, GenomeScope
- Input parameters: None required
- Workflow steps: The tool meryl counts kmers in the input reads (k=21), then converts this into a histogram. GenomeScope: runs a model on the histogram; reports estimates. k-mer ...
Data QC step, can run alone or as part of a combined workflow for large genome assembly.
- What it does: Reports statistics from sequencing reads.
- Inputs: long reads (fastq.gz format), short reads (R1 and R2) (fastq.gz format).
- Outputs: For long reads: a nanoplot report (the HTML report summarizes all the information). For short reads: a MultiQC report.
- Tools used: Nanoplot, FastQC, MultiQC.
- Input parameters: None required.
- Workflow steps: Long reads are analysed by Nanoplot; Short reads ...
Assembly polishing subworkflow: Racon polishing with short reads
Inputs: short reads and assembly (usually pre-polished with other tools first, e.g. Racon + long reads; Medaka)
Workflow steps:
- minimap2: short reads (R1 only) are mapped to the assembly => overlaps.paf. Minimap2 setting is for short reads.
- overlaps + short reads + assembly => Racon => polished assembly 1
- using polished assembly 1 as input; repeat minimap2 + racon => polished assembly 2
- Racon short-read polished ...
Assembly polishing; can run alone or as part of a combined workflow for large genome assembly.
- What it does: Polishes (corrects) an assembly, using long reads (with the tools Racon and Medaka) and short reads (with the tool Racon). (Note: medaka is only for nanopore reads, not PacBio reads).
- Inputs: assembly to be polished: assembly.fasta; long reads - the same set used in the assembly (e.g. may be raw or filtered) fastq.gz format; short reads, R1 only, in fastq.gz format
- Outputs: ...