A collection of workflows and pipelines developed as part of the ERGA consortium
Space: ERGA
SEEK ID: https://workflowhub.eu/projects/163
Public web page: https://www.erga-biodiversity.eu/
Organisms: No Organisms specified
WorkflowHub PALs: No PALs for this Team
Team created: 2nd May 2023
Related items
Teams: NBIS, ERGA Assembly
Organizations: NBIS – National Bioinformatics Infrastructure Sweden
https://orcid.org/0000-0003-1675-0677Expertise: Bioinformatics, Genomics, Scientific workflow developement, Workflows
I'm a bioinformatician for the National Bioinformatics Infrastrure Sweden. I specialise in de novo genome assembly and workflow development with Nextflow. I'm also a Nextflow ambassador and nf-core maintainer.
Teams: ERGA Assembly
Organizations: IZW
Teams: ERGA Assembly
Organizations: IZW
Teams: ERGA Assembly, ERGA Annotation
Organizations: University of Lausanne
https://orcid.org/0000-0003-4771-6113Teams: ERGA Assembly, ERGA Annotation
Organizations: Applied Omics Wóycicki
https://orcid.org/0000-0001-7991-7150Welcome to the ERGA Space!
Here we collect, curate and develop pipelines to assemble and annotation reference-quality genomes for all eukaryotic life.
For Genome Assembly pipelines head to our Assembly Team
For Genome Annotation pipelines head to our Annotation Team
Development, discussions and issue tracking takes place in the ERGA github repo
If you would like to join ...
Teams: ERGA Assembly, ERGA Annotation
Web page: https://www.erga-biodiversity.eu/
Assembly Evaluation for ERGA-BGE Reports
One Assembly, Illumina WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...
Assembly Evaluation for ERGA-BGE Reports
One Assembly, HiFi WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC ...
The workflow takes trimmed HiC forward and reverse reads, and Pri/Alt assemblies to produce a scaffolded primary assembliy (and alternate contigs) using YaHS. It also runs all the QC analyses (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed HiFi reads collection, Pri/Alt contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Pri and Alt contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed HiFi reads collection, and max coverage depth (calculated from WF1) to run Hifiasm in HiFi solo mode. It produces a Pri/Alt assembly, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed HiFi reads collection, runs Meryl to create a K-mer database, Genomescope2 to estimate genome properties and Smudgeplot to estimate ploidy. The main results are K-mer database and genome profiling plots, tables, and values useful for downstream analysis. Default K-mer length and ploidy for Genomescope are 21 and 2, respectively.
The workflow takes a HiFi reads collection, runs FastQC and SeqKit, filters with Cutadapt, and creates a MultiQC report. The main outputs are a collection of filtred reads, a report with raw and filtered reads stats, and a table with raw reads stats.
The workflow takes a paired-reads collection (like illumina WGS or HiC), runs FastQC and SeqKit, trims with Fastp, and creates a MultiQC report. The main outputs are a paired collection of trimmed reads, a report with raw and trimmed reads stats, and a table with raw reads stats.
Swedish Earth Biogenome Project - Genome Assembly Workflow
The primary genome assembly workflow for the Earth Biogenome Project at NBIS.
Workflow overview
General aim:
flowchart LR
hifi[/ HiFi reads /] --> data_inspection
ont[/ ONT reads /] --> data_inspection
hic[/ Hi-C reads /] --> data_inspection
data_inspection[[ Data inspection ]] --> preprocessing
preprocessing[[ Preprocessing ]] --> assemble
assemble[[ Assemble ]] --> validation
validation[[ Assembly
...
HiC scaffolding pipeline
Snakemake pipeline for scaffolding of a genome using HiC reads using yahs.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda --cores N
where N is number of cores to use. There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring the cluster.json
file run:
./run_cluster
...
Purge dups
This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4
. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml
file. To execute the workflow run:
snakemake --use-conda --cores N
Or configure the cluster.json and run using the ./run_cluster
command
HiC contact map generation
Snakemake pipeline for the generation of .pretext
and .mcool
files for visualisation of HiC contact maps with the softwares PretextView and HiGlass, respectively.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda
There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring ...
The workflow takes raw ONT reads and trimmed Illumina WGS paired reads collections, the ONT raw stats table (calculated from WF1) and the estimated genome size (calculated from WF1) to run NextDenovo and subsequently polish the assembly with HyPo. It produces collapsed assemblies (unpolished and polished) and runs all the QC analyses (gfastats, BUSCO, and Merqury).
The workflow takes raw ONT reads and trimmed Illumina WGS paired reads collections, and the estimated genome size and Max depth (both calculated from WF1) to run Flye and subsequently polish the assembly with HyPo. It produces collapsed assemblies (unpolished and polished) and runs all the QC analyses (gfastats, BUSCO, and Merqury).
CLAWS (CNAG's Long-read Assembly Workflow in Snakemake)
Snakemake Pipeline used for de novo genome assembly @CNAG. It has been developed for Snakemake v6.0.5.
It accepts Oxford Nanopore Technologies (ONT) reads, PacBio HFi reads, illumina paired-end data, illumina 10X data and Hi-C reads. It does the preprocessing of the reads, assembly, polishing, purge_dups, scaffodling and different evaluation steps. By default it will preprocess the reads, run Flye + Hypo + purge_dups + yahs and evaluate ...
Type: Snakemake
Creators: Jessica Gomez-Garrido, Fernando Cruz (CNAG), Francisco Camara (CNAG), Tyler Alioto (CNAG)
Submitter: Jessica Gomez-Garrido
The workflow takes trimmed HiC forward and reverse reads, and one assembly (e.g.: Hap1 or Pri or Collapsed) to produce a scaffolded assembly using YaHS. It also runs all the QC analyses (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed Illumina WGS paired-end reads collection, Collapsed contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Collapsed contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
The workflow takes a trimmed Illumina paired-end reads collection, runs Meryl to create a K-mer database, Genomescope2 to estimate genome properties and Smudgeplot to estimate ploidy. The main results are K-mer ddatabase and genome profiling plots, tables, and values useful for downstream analysis. Default K-mer length and ploidy for Genomescope are 21 and 2, respectively.
The workflow takes ONT reads collection, runs SeqKit and Nanoplot. The main outputs are a table and plots of raw reads stats.
The workflow takes a trimmed HiFi reads collection, Hap1/Hap2 contigs, and the values for transition parameter and max coverage depth (calculated from WF1) to run Purge_Dups. It produces purged Hap1 and Hap2 contigs assemblies, and runs all the QC analysis (gfastats, BUSCO, and Merqury).
Pipelines used by the genomes assembly teams part of the Biodiversity Genomics Europe project
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output a scaffolded primary assembliy and alternate contigs, with the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4
Collection of Galaxy workflows for generating results used for creating ERGA-BGE Reports
For a given genome, two workflows should be run: the assembly evaluation (ASM analyses), and the annotation evaluation (ANNOT analyses)
Depending on the kind of data used for the genome assembly, you should choose HiFi or ONT (Illumina) workflows for ASM analyses
Collection of workflows designed to assembled a set of PacBio HiFi and Illumina HiC reads into a chromosome-scale de-novo assembly.
Development versions of these pipelines can be found in the ERGA github and any questions or queries can be raised on the ERGA Discussions Channel
Want to find out more about the work done by ERGA? Become a member ...
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output two scaffolded haplotype assemblies and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4
Maintainers: Tom Brown, Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, HiFi, Hi-C