TronFlow alignment pipeline
Version 1

Workflow Type: Nextflow

TronFlow alignment pipeline

The TronFlow alignment pipeline is part of a collection of computational workflows for tumor-normal pair somatic variant calling.

This pipeline aligns paired and single end FASTQ files with BWA aln and mem algorithms and with BWA mem 2. For RNA-seq STAR is also supported. To increase sensitivity of novel junctions use --star_two_pass_mode (recommended for RNAseq variant calling). It also includes an initial step of read trimming using FASTP.

How to run it

Run it from GitHub as follows:

nextflow run tron-bioinformatics/tronflow-alignment -profile conda --input_files $input --output $output --algorithm aln --library paired

Otherwise download the project and run as follows:

nextflow -profile conda --input_files $input --output $output --algorithm aln --library paired

Find the help as follows:

$ nextflow run tron-bioinformatics/tronflow-alignment  --help
N E X T F L O W  ~  version 19.07.0
Launching `` [intergalactic_shannon] - revision: e707c77d7b

    nextflow --input_files input_files [--reference reference.fasta]

    * input_fastq1: the path to a FASTQ file (incompatible with --input_files)
    * input_files: the path to a tab-separated values file containing in each row the sample name and two paired FASTQs (incompatible with --fastq1 and --fastq2)
    when `--library paired`, or a single FASTQ file when `--library single`
    Example input file:
    name1	fastq1.1	fastq1.2
    name2	fastq2.1	fastq2.2
    * reference: path to the indexed FASTA genome reference or the star reference folder in case of using star

Optional input:
    * input_fastq2: the path to a second FASTQ file (incompatible with --input_files, incompatible with --library paired)
    * output: the folder where to publish output (default: output)
    * algorithm: determines the BWA algorithm, either `aln`, `mem`, `mem2` or `star` (default `aln`)
    * library: determines whether the sequencing library is paired or single end, either `paired` or `single` (default `paired`)
    * cpus: determines the number of CPUs for each job, with the exception of bwa sampe and samse steps which are not parallelized (default: 8)
    * memory: determines the memory required by each job (default: 32g)
    * inception: if enabled it uses an inception, only valid for BWA aln, it requires a fast file system such as flash (default: false)
    * skip_trimming: skips the read trimming step
    * star_two_pass_mode: activates STAR two-pass mode, increasing sensitivity of novel junction discovery, recommended for RNA variant calling (default: false)
    * additional_args: additional alignment arguments, only effective in BWA mem, BWA mem 2 and STAR (default: none) 

    * A BAM file \${name}.bam and its index
    * FASTP read trimming stats report in HTML format \${name.fastp_stats.html}
    * FASTP read trimming stats report in JSON format \${name.fastp_stats.json}

Input tables

The table with FASTQ files expects two tab-separated columns without a header

Sample name FASTQ 1 FASTQ 2
sample_1 /path/to/sample_1.1.fastq /path/to/sample_1.2.fastq
sample_2 /path/to/sample_2.1.fastq /path/to/sample_2.2.fastq

Reference genome

The reference genome has to be provided in FASTA format and it requires two set of indexes:

  • FAI index. Create with samtools faidx your.fasta
  • BWA indexes. Create with bwa index your.fasta

For bwa-mem2 a specific index is needed:

bwa-mem2 index your.fasta

For star a reference folder prepared with star has to be provided. In order to prepare it will need the reference genome in FASTA format and the gene annotations in GTF format. Run a command as follows:

STAR --runMode genomeGenerate --genomeDir $YOUR_FOLDER --genomeFastaFiles $YOUR_FASTA --sjdbGTFfile $YOUR_GTF


  • Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub.
  • Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890,
  • Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.
  • Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.

