RNA-Seq pipeline

Here we provide the tools to perform paired end or single read RNA-Seq analysis including raw data quality control, differential expression (DE) analysis and functional annotation. As input files you may use either zipped fastq-files (.fastq.gz) or mapped read data (.bam files). In case of paired end reads, corresponding fastq files should be named using .R1.fastq.gz and .R2.fastq.gz suffixes.

Pipeline Workflow

All analysis steps are illustrated in the pipeline flowchart. Specify the desired analysis details for your data in the essential.vars.groovy file (see below) and run the pipeline rnaseq.pipeline.groovy as described here. A markdown file DEreport.Rmd will be generated in the output reports folder after running the pipeline. Subsequently, the DEreport.Rmd file can be converted to a final html report using the knitr R-package.

The pipelines includes

quality control of rawdata with FastQC and MultiQC
Read mapping to the reference genome using STAR
generation of bigWig tracks for visualisation of alignment with deeptools
Characterization of insert size for paired-end libraries
Read quantification with featureCounts (Subread)
Library complexity assessment with dupRadar
RNA class representation
Check for strand specificity
Visualization of gene body coverage
Illustration of sample relatedness with MDS plots and heatmaps
Differential Expression Analysis for depicted group comparisons with DESeq2
Enrichment analysis for DE results with clusterProfiler and ReactomePA
Additional DE analysis including multimapped reads

Pipeline parameter settings

targets.txt: tab-separated txt-file giving information about the analysed samples. The following columns are required
- sample: sample identifier for use in plots and and tables
- file: read counts file name (a unique sub-string of the file name is sufficient, this sub-string is grebbed against the count file names produced by the pipeline)
- group: variable for sample grouping (e.g. by condition)
- replicate: replicate number of samples belonging to the same group
contrasts.txt: indicate intended group comparisions for differential expression analysis, e.g. KOvsWT=(KO-WT) if targets.txt contains the groups KO and WT. Give 1 contrast per line.
essential.vars.groovy: essential parameter describing the experiment including:
- ESSENTIAL_PROJECT: your project folder name
- ESSENTIAL_STAR_REF: path to STAR indexed reference genome
- ESSENTIAL_GENESGTF: genome annotation file in gtf-format
- ESSENTIAL_PAIRED: either paired end ("yes") or single read ("no") design
- ESSENTIAL_STRANDED: strandness of library (no|yes|reverse)
- ESSENTIAL_ORG: UCSC organism name
- ESSENTIAL_READLENGTH: read length of library
- ESSENTIAL_THREADS: number of threads for parallel tasks
additional (more specialized) parameter can be given in the var.groovy-files of the individual pipeline modules

Programs required

Bedtools
DEseq2
deeptools
dupRadar (provided by another project from imbforge)
FastQC
MultiQC
Picard
R packages DESeq2, clusterProfiler, ReactomePA
RSeQC
Samtools
STAR
Subread
UCSC utilities

RNA-Seq
Version 1

RNA-Seq pipeline

Pipeline Workflow

The pipelines includes

Pipeline parameter settings

Programs required

Version History

Version 1 (earliest) Created 7th Oct 2020 at 08:38 by Sergi Sayols

Creator

Submitter

RNA-Seq Version 1

RNA-Seq pipeline

Pipeline Workflow

The pipelines includes

Pipeline parameter settings

Programs required

Version History

Version 1 (earliest) Created 7th Oct 2020 at 08:38 by Sergi Sayols

Creator

Submitter

Related items

RNA-Seq
Version 1