Workflows
What is a Workflow?Filters
Assembly Evaluation for ERGA-BGE Reports
One Assembly, Illumina WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...
Assembly Evaluation for ERGA-BGE Reports
One Assembly, HiFi WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC ...
The workflow takes trimmed HiC forward and reverse reads, and Pri/Alt assemblies to produce a scaffolded primary assembliy (and alternate contigs) using YaHS. It also runs all the QC analyses (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed HiFi reads collection, Pri/Alt contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Pri and Alt contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed HiFi reads collection, and max coverage depth (calculated from WF1) to run Hifiasm in HiFi solo mode. It produces a Pri/Alt assembly, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed HiFi reads collection, runs Meryl to create a K-mer database, Genomescope2 to estimate genome properties and Smudgeplot to estimate ploidy. The main results are K-mer database and genome profiling plots, tables, and values useful for downstream analysis. Default K-mer length and ploidy for Genomescope are 21 and 2, respectively.
The workflow takes a HiFi reads collection, runs FastQC and SeqKit, filters with Cutadapt, and creates a MultiQC report. The main outputs are a collection of filtred reads, a report with raw and filtered reads stats, and a table with raw reads stats.
The workflow takes a paired-reads collection (like illumina WGS or HiC), runs FastQC and SeqKit, trims with Fastp, and creates a MultiQC report. The main outputs are a paired collection of trimmed reads, a report with raw and trimmed reads stats, and a table with raw reads stats.
Swedish Earth Biogenome Project - Genome Assembly Workflow
The primary genome assembly workflow for the Earth Biogenome Project at NBIS.
Workflow overview
General aim:
flowchart LR
hifi[/ HiFi reads /] --> data_inspection
ont[/ ONT reads /] --> data_inspection
hic[/ Hi-C reads /] --> data_inspection
data_inspection[[ Data inspection ]] --> preprocessing
preprocessing[[ Preprocessing ]] --> assemble
assemble[[ Assemble ]] --> validation
validation[[ Assembly
...
HiC scaffolding pipeline
Snakemake pipeline for scaffolding of a genome using HiC reads using yahs.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda --cores N
where N is number of cores to use. There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring the cluster.json
file run:
./run_cluster
...