A CWL-based pipeline for processing RNA-Seq data (FASTQ format) and performing differential gene/transcript expression analysis.
On the respective GitHub folder are available:
- The CWL wrappers for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
- A table of metadata (
mrna_cll_subsets_phenotypes.csv), based on the same validation analysis, to serve as an input example for the design of comparisons during differential expression ...
This workflow takes as input a list of paired-end fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously. The counts are reprocess to be similar to HTSeq-count output. FPKM are computed with cufflinks. Coverage (per million mapped reads) are computed with bedtools on uniquely mapped reads (with R2 orientation inverted).
This workflow takes as input a list of single-read fastqs. Adapters and bad quality bases are removed with cutadapt. Reads are mapped with STAR with ENCODE parameters and genes are counted simultaneously. The counts are reprocess to be similar to HTSeq-count output. FPKM are computed with cufflinks. Coverage (per million mapped reads) are computed with bedtools on uniquely mapped reads.
RNASeq-DE @ NCI-Gadi processes RNA sequencing data (single, paired and/or multiplexed) for differential expression (raw FASTQ to counts). This pipeline consists of multiple stages and is designed for the National Computational Infrastructure's (NCI) Gadi supercompter, leveraging multiple nodes to run each stage in parallel.
Infrastructure_deployment_metadata: Gadi (NCI)
Description: Trinity @ NCI-Gadi contains a staged Trinity workflow that can be run on the National Computational Infrastructure’s (NCI) Gadi supercomputer. Trinity performs de novo transcriptome assembly of RNA-seq data by combining three independent software modules Inchworm, Chrysalis and Butterfly to process RNA-seq reads. The algorithm can detect isoforms, handle paired-end reads, multiple insert sizes and strandedness. ...
A porting of the Trinity RNA assembly pipeline, https://trinityrnaseq.github.io, that uses Nextflow to handle the underlying sub-tasks. This enables additional capabilities to better use HPC resources, such as packing of tasks to fill up nodes and use of node-local disks to improve I/O. By design, the pipeline separates the workflow logic (main file) and the cluster-specific configuration (config files), improving portability.
Based on a pipeline by Sydney Informatics Hub: ...
Workflow for Spliced RNAseq data Steps:
- FastQC (Read Quality Control)
- fastp (Read Trimming)
- STAR (Read mapping)
- featurecounts (transcript read counts)
- kallisto (transcript [pseudo]counts)