Sydney Informatics Hub

Overview
Related items

The Sydney Informatics Hub is a Core Research Facility of The University of Sydney. We work towards enabling excellence in data and compute intensive research. We provide support, training, and expertise in statistics, data science, artificial intelligence, bioinformatics, software engineering, simulation, visualisation, and research computing. We are creating reusable workflows for bioinformatics on Australia's national supercompute resources & commercial cloud, as an official node of the Australian Biocommons

Space: Australian BioCommons

SEEK ID: https://workflowhub.eu/projects/43

Public web page: https://www.sydney.edu.au/sydney-informatics-hub

Organisms: No Organisms specified

WorkflowHub PALs: No PALs for this Team

Team created: 28th Jun 2021

Annotated Properties

Scientific disciplines

Computer Science

Related items

Advanced People list for this Team with search and filtering

Frederick Jaya

Teams: Sydney Informatics Hub

Organizations: The University of Sydney

https://orcid.org/0000-0002-4019-7026

Mitchell J O'Brien

Teams: Sydney Informatics Hub

Organizations: The University of Sydney

https://orcid.org/0000-0003-0662-9101

Expertise: Bioinformatics, Genomics, Genetics

Senior Bioinformatics Engineer - Sydney Informatics Hub | Australian BioCommons

Georgina Samaha

Teams: Sydney Informatics Hub

Organizations: Australian BioCommons

https://orcid.org/0000-0003-0419-1476

Amarinder Singh Thind

Teams: Sydney Informatics Hub

Organizations: The University of Sydney

https://orcid.org/0000-0003-4592-0380

Expertise: Bioinformatics, Biostatistics, Data science

Dr. Thind is a Senior Research Bioinformatician at University of Sydney (Australia) and holds an honorary affiliation with the University of Wollongong. Dr. Thind is an experienced bioinformatician and data analyst with a strong background in statistics, multi-omics analysis, and health data science. His research focuses on the integration and interpretation of large-scale biological and clinical datasets to advance understanding of disease mechanisms, diagnostics, and treatment outcomes.

He has ...

Cali Willet

Teams: Sydney Informatics Hub

Organizations: The University of Sydney

https://orcid.org/0000-0001-8449-1502

Advanced Spaces list for this Team with search and filtering

Australian BioCommons

The Australian BioCommons enhances digital life science research through world class collaborative distributed infrastructure. It aims to ensure that Australian life science research remains globally competitive, through sustained strategic leadership, research community engagement, digital service provision, training and support.

Teams: Australian BioCommons, QCIF Bioinformatics, Pawsey Supercomputing Research Centre, Sydney Informatics Hub, Janis, Melbourne Data Analytics Platform (MDAP), Galaxy Australia, National Computational Infrastructure (NCI) WorkflowHub team

Web page: https://www.biocommons.org.au/

Advanced Organizations list for this Team with search and filtering

Australian BioCommons

ROR ID: Not specified

Department: Not specified

Country: Australia

City: Not specified

Web page: Not specified

The University of Sydney

ROR ID: Not specified

Department: Not specified

Country: Australia

City: Not specified

Web page: Not specified

Advanced Workflows list for this Team with search and filtering

ORBiT

Sydney Informatics Hub

Stable

ORBiT

A Nextflow workflow for analysing Oxford Nanopore Technologies (ONT) RNAseq direct read sequening (DRS) or cDNA data.

This workflow emphasises sensitivity to detect rare and novel features within the data. Multiple aspects of this workflow are tailored to enhance sensitivity:

Alignment to reference genome rather than transcriptome
Multiple tools per analysis type (n = 2 isoforms, n = 3 fusions)
Reads quantification tools capable of detecting novel isoforms, and counting at the isoform ...

Type: Nextflow

Creators: Cali Willet, Amarinder Thind, Michael Geaghan, Mitchell O'Brien, Madison Gonebale, Marina Kennerson

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.2160.1

Created: 14th Apr 2026 at 10:44, Last updated: 14th Apr 2026 at 10:47

scRNAvigator: Interactive exploration, processing, and analysis of your scRNA-seq data

Sydney Informatics Hub

(Show All)

Stable

scRNAvigator: Interactive exploration, processing, and analysis of your scRNA-seq data

This collection of R notebooks has been designed to guide you through processing and analysing your single cell RNA (scRNA) sequencing data. They are designed to be worked through in the following order:

Quality control
Doublet detection
Dataset integration
Cell annotation
Pseudobulking and differential gene expression analysis
Pathway enrichment analyses.

Each notebook explains what is ...

Type: Quarto Markdown

Creators: Michael Geaghan, Frederick Jaya, Mitchell J O'Brien, Georgina Samaha, We thank Martyn Bullock and Sumathy Perampalam for their testing support, feedback, and providing data used in developing this workflow

Submitter: Frederick Jaya

DOI: 10.48546/workflowhub.workflow.2022.2

Created: 27th Nov 2025 at 23:56, Last updated: 28th Nov 2025 at 01:35

Somatic-ShortV @ NCI-Gadi

Sydney Informatics Hub

(Show All)

Work-in-progress

Somatic-ShortV @ NCI-Gadi is a variant calling pipeline that calls somatic short variants (SNPs and indels) from tumour and matched normal BAM files following GATK's Best Practice Workflow. This workflow is designed for the National Computational Infrastructure's (NCI) Gadi supercompter, leveraging multiple nodes on NCI Gadi to run all stages of the workflow in parallel. ...

Type: Shell Script

Creators: Tracy Chew, Cali Willet, Rosemarie Sadsad

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.148.1

Download

Created: 19th Aug 2021 at 00:14, Last updated: 25th Jul 2025 at 03:06

Germline-ShortV @ NCI-Gadi

Australian BioCommons, Sydney Informatics Hub

(Show All)

Work-in-progress

Germline-ShortV @ NCI-Gadi is an implementation of the BROAD Institute's best practice workflow for germline short variant discovery. This implementation is optimised for the National Compute Infrastucture's Gadi HPC, utilising scatter-gather parallelism to enable use of multiple nodes with high CPU or memory efficiency. This workflow requires sample BAM files, which can be generated using the Fastq-to-bam @ NCI-Gadi pipeline. Germline-ShortV can be applied ...

Type: Shell Script

Creators: Tracy Chew, Cali Willet, Georgina Samaha, Rosemarie Sadsad

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.143.1

Download

Created: 17th Aug 2021 at 05:35, Last updated: 25th Jul 2025 at 03:04

Bootstrapping-for-BQSR @ NCI-Gadi

Sydney Informatics Hub

Work-in-progress

Bootstrapping-for-BQSR @ NCI-Gadi is a pipeline for bootstrapping a variant resource to enable GATK base quality score recalibration (BQSR) for non-model organisms that lack a publicly available variant resource. This implementation is optimised for the National Compute Infrastucture's Gadi HPC. Multiple rounds of bootstrapping can be performed. Users can use Fastq-to-bam @ NCI-Gadi and Germline-ShortV @ NCI-Gadi to ...

Type: Shell Script

Creators: Cali Willet, Tracy Chew

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.153.1

Download

Created: 19th Aug 2021 at 00:26, Last updated: 25th Jul 2025 at 03:02

ONT-bacpac-nf

Sydney Informatics Hub

Stable

A rapid and portable workflow for pond-side sequencing of bacterial pathogens for sustainable aquaculture using ONT long-read sequencing.

Type: Nextflow

Creators: Georgina Samaha, Francisca Samsing, Mitchell J O'Brien, Frederick Jaya

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.1263.1

Created: 31st Jan 2025 at 04:42

Parabricks-Genomics-nf

Sydney Informatics Hub

Parabricks-Genomics-nf is a GPU-enabled pipeline for alignment and germline short variant calling for short read sequencing data. The pipeline utilises NVIDIA's Clara Parabricks toolkit to dramatically speed up the execution of best practice bioinformatics tools. Currently, this pipeline is configured specifically for NCI's Gadi HPC.

NVIDIA's Clara Parabricks can deliver a significant ...

Type: Nextflow

Creator: Georgina Samaha

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.836.1

Created: 26th Apr 2024 at 00:19

Somatic-ShortV-nf

Sydney Informatics Hub, Australian BioCommons

(Show All)

Work-in-progress

This is a Nextflow implementaion of the GATK Somatic Short Variant Calling workflow. This workflow can be used to discover somatic short variants (SNVs and indels) from tumour and matched normal BAM files following GATK's Best Practices Workflow. The workflowis currently optimised to run efficiently and at scale on the National Compute Infrastructure, Gadi.

Type: Nextflow

Creators: Nandan Deshpande, Tracy Chew, Cali Willet, Georgina Samaha

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.691.1

Created: 20th Dec 2023 at 01:12, Last updated: 20th Dec 2023 at 01:16

GermlineStructuralV-nf

Sydney Informatics Hub, Australian BioCommons

(Show All)

GermlineStructuralV-nf is a pipeline for identifying structural variant events in human Illumina short read whole genome sequence data. GermlineStructuralV-nf identifies structural variant and copy number events from BAM files using Manta, Smoove, and TIDDIT. Variants are then merged using SURVIVOR, ...

Type: Nextflow

Creators: Georgina Samaha, Marina Kennerson, Tracy Chew, Sarah Beecroft

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.431.1

Created: 31st Jan 2023 at 23:40, Last updated: 18th Dec 2023 at 05:36

IGVreport-nf

Sydney Informatics Hub, Australian BioCommons

Work-in-progress

IGVreport-nf

Description
Diagram
User guide
Workflow summaries
Metadata
Component tools
Required (minimum) inputs/parameters
Additional notes
Help/FAQ/Troubleshooting
Acknowledgements/citations/credits

Description

Quickly generate [IGV .html ...

Type: Nextflow

Creators: Georgina Samaha, Tracy Chew

Submitter: Georgina Samaha

Created: 21st Mar 2023 at 05:17

IndexReferenceFasta-nf

Sydney Informatics Hub, Australian BioCommons

Stable

IndexReferenceFasta-nf

===========

Description
Diagram
User guide
Benchmarking
Workflow summaries
Metadata
Component tools
Required (minimum) inputs/parameters
Additional notes
Help/FAQ/Troubleshooting
Acknowledgements/citations/credits ...

Type: Nextflow

Creator: Georgina Samaha

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.393.1

Created: 12th Oct 2022 at 03:34

Fastq-to-bam @ NCI-Gadi

Australian BioCommons, Sydney Informatics Hub

(Show All)

Stable

Fastq-to-BAM @ NCI-Gadi is a genome alignment workflow that takes raw FASTQ files, aligns them to a reference genome and outputs analysis ready BAM files. This workflow is designed for the National Computational Infrastructure's (NCI) Gadi supercompter, leveraging multiple nodes on NCI Gadi to run all stages of the workflow in parallel, either massively parallel using the scatter-gather approach or parallel by sample. It consists of a number of stages and follows the BROAD Institute's best practice ...

Type: Shell Script

Creators: Cali Willet, Tracy Chew, Georgina Samaha, Rosemarie Sadsad, Andrey Bliznyuk, Ben Menadue, Rika Kobayashi, Matthew Downton, Yue Sun

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.146.1

Download

Created: 17th Aug 2021 at 05:45, Last updated: 1st Sep 2022 at 00:23

RNASeq-DE @ NCI-Gadi

Sydney Informatics Hub

Stable

RNASeq-DE @ NCI-Gadi processes RNA sequencing data (single, paired and/or multiplexed) for differential expression (raw FASTQ to counts). This pipeline consists of multiple stages and is designed for the National Computational Infrastructure's (NCI) Gadi supercompter, leveraging multiple nodes to run each stage in parallel.

Infrastructure_deployment_metadata: Gadi (NCI)

Type: Shell Script

Creators: Tracy Chew, Rosemarie Sadsad, Cali Willet

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.152.1

Download

Created: 19th Aug 2021 at 00:24, Last updated: 23rd Aug 2022 at 07:09

GermlineShortV_biovalidation

Sydney Informatics Hub

(Show All)

Work-in-progress

GermlineShortV_biovalidation

Description
Diagram
User guide
Quick start guide
Benchmarking
Workflow summaries
Metadata
Component tools
Required (minimum) inputs/parameters Preparing your own input files
Additional notes
[Understanding your ...

Type: Shell Script

Creators: Georgina Samaha, Tracy Chew, Cali Willet, Nandan Deshpande

Submitter: Georgina Samaha

DOI: 10.48546/workflowhub.workflow.339.1

Created: 5th May 2022 at 06:02

Shotgun-Metagenomics-Analysis

Sydney Informatics Hub, Australian BioCommons

(Show All)

Work-in-progress

Shotgun Metagenomics Analysis

Analysis of metagenomic shotgun sequences including assembly, speciation, ARG discovery and more

Description

The input for this analysis is paired end next generation sequencing data from metagenomic samples. The workflow is designed to be modular, so that individual modules can be run depending on the nature of the metagenomics project at hand. More modules will be added as we develop them - this repo is a work in progress!

These scripts have been written ...

Type: Shell Script

Creators: Cali Willet, Rosemarie Sadsad, Tracy Chew, Smitha Sukumar, Elena Martinez, Christina Adler, Henry Lydecker, Fang Wang

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.327.1

Created: 7th Apr 2022 at 01:45, Last updated: 7th Apr 2022 at 07:25

Flashlite-Supernova

Australian BioCommons, Sydney Informatics Hub

Work-in-progress

The Flashlite-Supernova pipeline runs Supernova to generate phased whole-genome de novo assemblies from a Chromium prepared library on University of Queensland's HPC, Flashlite.

Infrastructure_deployment_metadata: FlashLite (QRISCloud)

Type: Shell Script

Creators: None

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.151.1

Download

Created: 19th Aug 2021 at 00:21, Last updated: 9th Sep 2021 at 02:31

Flashlite-Trinity

Australian BioCommons, Sydney Informatics Hub

(Show All)

Stable

Flashlite-Trinity contains two workflows that run Trinity on the University of Queensland's HPC, Flashlite. Trinity performs de novo transcriptome assembly of RNA-seq data by combining three independent software modules Inchworm, Chrysalis and Butterfly to process RNA-seq reads. The algorithm can detect isoforms, handle paired-end reads, multiple insert sizes and strandedness. Users can run Flashlite-Trinity on single samples, or smaller samples requiring <500Gb ...

Type: Shell Script

Creators: Tracy Chew, Rosemarie Sadsad, Georgina Samaha, Cali Willet

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.149.1

Download

Created: 19th Aug 2021 at 00:17, Last updated: 7th Sep 2021 at 07:27

Flashlite-Juicer

Sydney Informatics Hub

Stable

Flashlite-Juicer is a PBS implementation of Juicer for University of Queensland's Flashlite HPC.

Infrastructure_deployment_metadata: FlashLite (QRISCloud)

Type: Shell Script

Creators: Tracy Chew, Rosemarie Sadsad, Nathaniel Butterworth

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.150.1

Download

Created: 19th Aug 2021 at 00:19, Last updated: 7th Sep 2021 at 07:26

Trinity @ NCI-Gadi

Australian BioCommons, Sydney Informatics Hub

(Show All)

Work-in-progress

Description: Trinity @ NCI-Gadi contains a staged Trinity workflow that can be run on the National Computational Infrastructure’s (NCI) Gadi supercomputer. Trinity performs de novo transcriptome assembly of RNA-seq data by combining three independent software modules Inchworm, Chrysalis and Butterfly to process RNA-seq reads. The algorithm can detect isoforms, handle paired-end reads, multiple insert sizes and strandedness. ...

Type: Shell Script

Creators: Georgina Samaha, Rosemarie Sadsad, Tracy Chew, Matthew Downton, Andrey Bliznyuk, Rika Kobayashi, Ben Menadue, Ben Evans

Submitter: Tracy Chew

DOI: 10.48546/workflowhub.workflow.145.1

Download

Created: 17th Aug 2021 at 05:44, Last updated: 7th Sep 2021 at 07:20