Workflows
What is a Workflow?Filters
Assembly Evaluation for ERGA-BGE Reports
One Assembly, HiFi WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC ...
The workflow takes a trimmed HiFi reads collection, Pri/Alt contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Pri and Alt contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
Purge dups
This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4
. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml
file. To execute the workflow run:
snakemake --use-conda --cores N
Or configure the cluster.json and run using the ./run_cluster
command