Reference-based assembly with bacpage
Version 1

Workflow Type: Workflow Description Language
Work-in-progress

bacpage{width=500}

This repository contains an easy-to-use pipeline for the assembly and analysis of bacterial genomes using ONT long-read or Illumina short-read technology.

Introduction

Advances in sequencing technology during the COVID-19 pandemic has led to massive increases in the generation of sequencing data. Many bioinformatics tools have been developed to analyze this data, but very few tools can be utilized by individuals without prior bioinformatics training.

This pipeline was designed to encapsulate pre-existing tools to automate analysis of whole genome sequencing of bacteria. Installation is fast and straightfoward. The pipeline is easy to setup and contains rationale defaults, but is highly modular and configurable by more advance users. A successful run generates consensus sequences, typing information, phylogenetic tree, and quality control report.

Features

We anticipate the pipeline will be able to perform the following functions:

  • [x] Reference-based assembly of Illumina paired-end reads
  • [x] De novo assembly of Illumina paired-end reads
  • [ ] De novo assembly of ONT long reads
  • [x] Run quality control checks
  • [x] Variant calling using bcftools
  • [x] Maximum-likelihood phylogenetic inference of processed samples and background dataset using iqtree
  • [x] MLST profiling and virulence factor detection
  • [x] Antimicrobial resistance genes detection
  • [ ] Plasmid detection

Installation

  1. Install miniconda by running the following two command:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
  1. Clone the repository:
git clone https://github.com/CholGen/bacpage.git
  1. Install and activate the pipeline's conda environment:
cd bacpage/
mamba env create -f environment.yaml
mamba activate bacpage
  1. Install the bacpage command:
pip install .
  1. Test the installation:
bacpage -h
bacpage version

These command should print the help and version of the program. Please create an issue if this is not the case.

Usage

  1. Navigate to the pipeline's directory.
  2. Copy the example/ directory to create a directory specifically for each batch of samples.
cp example/ 
  1. Place raw sequencing reads in the input/ directory of your project directory.
  2. Record the name and absolute path of raw sequencing reads in the sample_data.csv found within your project directory.
  3. Replace the values and in config.yaml found within your project directory, with the absolute path of your project directory and pipeline directory, respectively.
  4. Determine how many cores are available on your computer:
cat /proc/cpuinfo | grep processor
  1. From the pipeline's directory, run the entire pipeline on your samples using the following command:
snakemake --configfile /config.yaml --cores 

This will generate a consensus sequence in FASTA format for each of your samples and place them in /results/consensus_sequences/.masked.fasta. An HTML report containing alignment and quality metrics for your samples can be found at /results/reports/qc_report.html. A phylogeny comparing your sequences to the background dataset can be found at /results/phylogeny/phylogeny.tree

Click and drag the diagram to pan, double click or use the controls to zoom.

Version History

split_into_command @ ea128c8 (earliest) Created 20th Dec 2023 at 17:45 by Nathaniel Matteson

Add workflow image


Frozen split_into_command ea128c8
help Creators and Submitter
Creators
Not specified
Submitter
Activity

Views: 215

Created: 20th Dec 2023 at 17:45

Last updated: 20th Dec 2023 at 17:49

help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 9.74 MB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH