GALOP - Genome Assembly using Long reads Pipeline
This repository contains an exact copy of the standard Genoscope long reads assembly pipeline.
At the moment, this is not intended for users to download as it uses grid submission commands that will only work at Genoscope. As time goes on, we intend to make this pipeline available to a broader audience. However, genome assembly and polishing commands are accessible in the lib/assembly.py
and lib/polishing.py
files.
galop.py -h
Mandatory
...
skim2mito
skim2mito is a snakemake pipeline for the batch assembly, annotation, and phylogenetic analysis of mitochondrial genomes from low coverage genome skims. The pipeline was designed to work with sequence data from museum collections. However, it should also work with genome skims from recently collected samples.
Contents
- Setup
- Example data
- Input
- Output
- Filtering contaminants
- [Assembly and ...
Assembly Evaluation for ERGA-BGE Reports
One Assmebly, HiFi WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and HiC ...
Assembly Evaluation for ERGA-BGE Reports
One Assmebly, Illumina WGS reads + HiC reads
The workflow requires the following:
- Species Taxonomy ID number
- NCBI Genome assembly accession code
- BUSCO Lineage
- WGS accurate reads accession code
- NCBI HiC reads accession code
The workflow will get the data and process it to generate genome profiling (genomescope, smudgeplot -optional-), assembly stats (gfastats), merqury stats (QV, completeness), BUSCO, snailplot, contamination blobplot, and ...
HiC contact map generation
Snakemake pipeline for the generation of .pretext
and .mcool
files for visualisation of HiC contact maps with the softwares PretextView and HiGlass, respectively.
Prerequisites
This pipeine has been tested using Snakemake v7.32.4
and requires conda for installation of required tools. To run the pipline use the command:
snakemake --use-conda
There are provided a set of configuration and running scripts for exectution on a slurm queueing system. After configuring ...
Purge dups
This snakemake pipeline is designed to be run using as input a contig-level genome and pacbio reads. This pipeline has been tested with snakemake v7.32.4
. Raw long-read sequencing files and the input contig genome assembly must be given in the config.yaml
file. To execute the workflow run:
snakemake --use-conda --cores N
Or configure the cluster.json and run using the ./run_cluster
command
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output a scaffolded primary assembliy and alternate contigs, with the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4
Collection of Galaxy workflows for generating results used for creating ERGA-BGE Reports
For a given genome, two workflows should be run: the assembly evaluation (ASM analyses), and the annotation evaluation (ANNOT analyses)
Depending on the kind of data used for the genome assembly, you should choose HiFi or ONT (Illumina) workflows for ASM analyses
Collection of workflows designed to assembled a set of PacBio HiFi and Illumina HiC reads into a chromosome-scale de-novo assembly.
Development versions of these pipelines can be found in the ERGA github and any questions or queries can be raised on the ERGA Discussions Channel
Want to find out more about the work done by ERGA? Become a member ...
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be Oxford Nanopore raw reads plus Illumina WGS reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output one scaffolded collapsed assembly and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for ONT, and another one for Illumina that can be used independently for the WGS and HiC reads), WF1, WF2, WF3, WF4
Maintainers: Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, ONT, illumina, Hi-C
Collection of de-novo genome assembly workflows written for implementation in Galaxy
Input data should be PacBio HiFi reads and Illumina 3-dimensional Chromatin Confirmation Capture (HiC) reads
Executing all workflows will output two scaffolded haplotype assemblies and the complete QC analyses
Please run the workflows in order: WF0 (there are two, one for HiFi and one for Illumina HiC), WF1, WF2, WF3, WF4
Maintainers: Tom Brown, Diego De Panis
Number of items: 6
Tags: Assembly, Bioinformatics, Galaxy, Genomics, Genome assembly, HiFi, Hi-C