Research Object Crate for AssemblyQC: A NextFlow pipeline for evaluating assembly quality

Original URL: https://workflowhub.eu/workflows/1058/ro_crate?version=1

[![GitHub Actions CI Status](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/ci.yml/badge.svg)](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/ci.yml) [![GitHub Actions Linting Status](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/linting.yml/badge.svg)](https://github.com/plant-food-research-open/assemblyqc/actions/workflows/linting.yml)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.10647870-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.10647870) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com) [![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) [![run with conda ❌](http://img.shields.io/badge/run%20with-conda%20❌-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) [![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/plant-food-research-open/assemblyqc) ## Introduction **plant-food-research-open/assemblyqc** is a [NextFlow](https://www.nextflow.io/docs/latest/index.html) pipeline which evaluates assembly quality with multiple QC tools and presents the results in a unified html report. The tools are shown in the [Pipeline Flowchart](#pipeline-flowchart) and their references are listed in [CITATIONS.md](./CITATIONS.md). ## Pipeline Flowchart ```mermaid %%{init: { 'theme': 'base', 'themeVariables': { 'fontSize': '52px", 'primaryColor': '#9A6421', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#9A6421', 'lineColor': '#B180A8', 'secondaryColor': '#455C58', 'tertiaryColor': '#ffffff' } }}%% flowchart LR forEachTag(Assembly) ==> VALIDATE_FORMAT[VALIDATE FORMAT] VALIDATE_FORMAT ==> ncbiFCS[NCBI FCS\nADAPTOR] ncbiFCS ==> Check{Check} VALIDATE_FORMAT ==> ncbiGX[NCBI FCS GX] ncbiGX ==> Check Check ==> |Clean|Run(Run) Check ==> |Contamination|Skip(Skip All) Skip ==> REPORT VALIDATE_FORMAT ==> GFF_STATS[GENOMETOOLS GT STAT] Run ==> ASS_STATS[ASSEMBLATHON STATS] Run ==> BUSCO Run ==> TIDK Run ==> LAI Run ==> KRAKEN2 Run ==> HIC_CONTACT_MAP[HIC CONTACT MAP] Run ==> MUMMER Run ==> MINIMAP2 Run ==> MERQURY MUMMER ==> CIRCOS MUMMER ==> DOTPLOT MINIMAP2 ==> PLOTSR ASS_STATS ==> REPORT GFF_STATS ==> REPORT BUSCO ==> REPORT TIDK ==> REPORT LAI ==> REPORT KRAKEN2 ==> REPORT HIC_CONTACT_MAP ==> REPORT CIRCOS ==> REPORT DOTPLOT ==> REPORT PLOTSR ==> REPORT MERQURY ==> REPORT ``` - [FASTA VALIDATOR](https://github.com/linsalrob/fasta_validator) + [SEQKIT RMDUP](https://github.com/shenwei356/seqkit): FASTA validation - [GENOMETOOLS GT GFF3VALIDATOR](https://genometools.org/tools/gt_gff3validator.html): GFF3 validation - [ASSEMBLATHON STATS](https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl): Assembly statistics - [GENOMETOOLS GT STAT](https://genometools.org/tools/gt_stat.html): Annotation statistics - [NCBI FCS ADAPTOR](https://github.com/ncbi/fcs): Adaptor contamination pass/fail - [NCBI FCS GX](https://github.com/ncbi/fcs): Foreign organism contamination pass/fail - [BUSCO](https://gitlab.com/ezlab/busco): Gene-space completeness estimation - [TIDK](https://github.com/tolkit/telomeric-identifier): Telomere repeat identification - [LAI](https://github.com/oushujun/LTR_retriever/blob/master/LAI): Continuity of repetitive sequences - [KRAKEN2](https://github.com/DerrickWood/kraken2): Taxonomy classification - [HIC CONTACT MAP](https://github.com/igvteam/juicebox.js): Alignment and visualisation of HiC data - [MUMMER](https://github.com/mummer4/mummer) → [CIRCOS](http://circos.ca/documentation/) + [DOTPLOT](https://plotly.com) & [MINIMAP2](https://github.com/lh3/minimap2) → [PLOTSR](https://github.com/schneebergerlab/plotsr): Synteny analysis - [MERQURY](https://github.com/marbl/merqury): K-mer completeness, consensus quality and phasing assessment ## Usage Refer to [usage](./docs/usage.md), [parameters](./docs/parameters.md) and [output](./docs/output.md) documents for details. > [!NOTE] > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data. Prepare an `assemblysheet.csv` file with following columns representing target assemblies and associated meta-data. - `tag:` A unique tag which represents the target assembly throughout the pipeline and in the final report - `fasta:` FASTA file Now, you can run the pipeline using: ```bash nextflow run plant-food-research-open/assemblyqc \ -profile \ --input assemblysheet.csv \ --outdir ``` > [!WARNING] > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; > see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). ### Plant&Food Users Download the pipeline to your `/workspace/$USER` folder. Change the parameters defined in the [pfr/params.json](./pfr/params.json) file. Submit the pipeline to SLURM for execution. ```bash sbatch ./pfr_assemblyqc ``` ## Credits plant-food-research-open/assemblyqc was originally written by Usman Rashid ([@gallvp](https://github.com/gallvp)) and Ken Smith ([@hzlnutspread](https://github.com/hzlnutspread)). Ross Crowhurst ([@rosscrowhurst](https://github.com/rosscrowhurst)), Chen Wu ([@christinawu2008](https://github.com/christinawu2008)) and Marcus Davy ([@mdavy86](https://github.com/mdavy86)) generously contributed their QC scripts. Mahesh Binzer-Panchal ([@mahesh-panchal](https://github.com/mahesh-panchal)) helped port the pipeline modules and sub-workflows to [nf-core](https://nf-co.re) schema. We thank the following people for their extensive assistance in the development of this pipeline: - [Cecilia Deng](https://github.com/CeciliaDeng) - [Ignacio Carvajal](https://github.com/ignacio3437) - [Jason Shiller](https://github.com/jasonshiller) - [Sarah Bailey](https://github.com/SarahBailey1998) - [Susan Thomson](https://github.com/cflsjt) - [Ting-Hsuan Chen](https://github.com/ting-hsuan-chen) The pipeline uses nf-core modules contributed by following authors: ## Contributions and Support If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). ## Citations If you use plant-food-research-open/assemblyqc for your analysis, please cite it as: > Rashid, U., Wu, C., Shiller, J., Smith, K., Crowhurst, R., Davy, M., Chen, T.-H., Thomson, S., & Deng, C. (2024). AssemblyQC: A NextFlow pipeline for evaluating assembly quality (2.0.0). Zenodo. https://doi.org/10.5281/zenodo.10647870 An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE). > **The nf-core framework for community-curated bioinformatics pipelines.** > > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).

Author
License
MIT

Contents