rnaseq-pe/main
v0.1 (earliest)

v1.2 (latest)

v1.1

v1.0

v0.9

v0.6

v0.8

v0.7

v0.5

v0.4.1

v0.4

v0.3

v0.2

v0.1 (earliest)

View on GitHub

Download RO-Crate

Run on Galaxy

This workflow does not specify a "main" workflow file.

Workflow Type: Galaxy

Tests Not available

RNA-seq paired-end Workflow

Inputs dataset

The workflow needs a list of dataset pairs of fastqsanger.
As well as a gtf file with genes
Optional, but recommended: a gtf file with regions to exclude from normalization in Cufflinks.
- For instance a gtf that masks chrM for the mm10 genome:

chrM	chrM_gene	exon	0	16299	.	+	.	gene_id "chrM_gene_plus"; transcript_id "chrM_tx_plus"; exon_id "chrM_ex_plus";
chrM	chrM_gene	exon	0	16299	.	-	.	gene_id "chrM_gene_minus"; transcript_id "chrM_tx_minus"; exon_id "chrM_ex_minus";

Inputs values

adapter sequences: this depends on the library preparation. Usually classical RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter.
reference_genome: this field will be adapted to the genomes available for STAR
strandness: For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence. This will help you to get from STAR only the counts corresponding to your library preparation. This is also used for the stranded coverage and for FPKM computation with cufflinks.

Processing

The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp
The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene.
A multiQC is run to have an overview of the QC. This can also be used to get the strandness.
FPKM values for reads and transcripts are computed with cufflinks using correction for multi-mapped reads.
The BAM is filtered to keep only uniquely mapped reads (tag NH:i:1).
Coverage unstranded, and each strand independently is computed with bedtools and normalized to the number of million uniquely mapped reads (in order to compute stranded coverage the BAM is modified so second mate in pairs matches orientation of the first mate in pairs).
The three coverage files are converted to bigwig.

Warning

The coverage stranded output depends on the strandness of the library:
- If you have an unstranded library, stranded coverages are useless
- If you have a forward stranded library, the label matches the orientation of the first read in pairs.
- If you have a reverse stranded library, the label matches the orientation of the second read in pairs.

SEEK ID: https://workflowhub.eu/workflows/401?version=1

Inputs

ID	Name	Description	Type
PE fastq input	PE fastq input	Should be a list of paired-end RNA-seq fastqs	File[]
forward_adapter	forward_adapter	Please use: For R1: - For Nextera: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC - For TrueSeq: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC or AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC	string
gtf	gtf	gtf compatible with the reference_genome. Mind the UCSC/Ensembl differences in chromosome naming	File
gtf with regions to exclude from FPKM normalization	gtf with regions to exclude from FPKM normalization	Could be a gtf with for example one entry for the chrM forward and one entry for the chrM reverse	File?
reference_genome	reference_genome	reference_genome	string
reverse_adapter	reverse_adapter	Please use: For R2: - For Nextera: CTGTCTCTTATACACATCTGACGCTGCCGACGA - For TruSeq: GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT or AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT	string
strandness	strandness	For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence	string

Steps

ID	Name	Description
7	Cutadapt (remove adapter + bad quality bases)	toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.0+galaxy1
8	get reference_genome as text parameter	toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1
9	awk command from strand	toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
10	bedtools orientation for forward coverage	toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
11	bedtools orientation for reverse coverage	toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
12	Get cufflinks strandess parameter	toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
13	STAR: map and count	toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0
14	MultiQC	toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0
15	Extract gene counts	toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2
16	Keep only uniquely mapped reads	toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.1+galaxy0
17	get scaling factor	This step get 1 / millions of uniquely mapped reads toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2
18	Compute FPKM	toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/cufflinks/2.2.1.3
19	revertR2orientationInBam	toolshed.g2.bx.psu.edu/repos/lldelisle/revertr2orientationinbam/revertR2orientationInBam/0.0.2
20	convert dataset to parameter	param_value_from_file
21	Scaled Coverage both strands combined	toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0
22	Scaled Coverage positive	toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0
23	Scaled Coverage negative	toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0
24	convert both strands coverage to bigwig	wig_to_bigWig
25	convert positive coverage to bigwig	wig_to_bigWig
26	convert negative coverage to bigwig	wig_to_bigWig

Outputs

ID	Name	Description	Type
mapped-reads	mapped-reads	n/a	File
output_log	output_log	n/a	File
reads_per_gene from STAR	reads_per_gene from STAR	n/a	File
MultiQC webpage	MultiQC webpage	n/a	File
MultiQC on input dataset(s): Stats	MultiQC on input dataset(s): Stats	n/a	File
HTS count like output	HTS count like output	n/a	File
genes_expression	genes_expression	n/a	File
transcripts_expression	transcripts_expression	n/a	File
both strands coverage	both strands coverage	n/a	File
positive strand coverage	positive strand coverage	n/a	File
negative strand coverage	negative strand coverage	n/a	File

Version History

v1.2 (latest) Created 29th Jan 2025 at 03:02 by WorkflowHub Bot

Updated to v1.2

Frozen v1.2 785f822

v1.1 Created 19th Nov 2024 at 03:02 by WorkflowHub Bot

Updated to v1.1

Frozen v1.1 9cea532

v1.0 Created 14th Nov 2024 at 03:02 by WorkflowHub Bot

Updated to v1.0

Frozen v1.0 2b092a3

v0.9 Created 7th Oct 2024 at 16:33 by WorkflowHub Bot

Updated to v0.9

Frozen v0.9 5b0324e

v0.6 Created 7th Oct 2024 at 16:33 by WorkflowHub Bot

Updated to v0.6

Frozen v0.6 a7e0a3d

v0.8 Created 16th Jul 2024 at 03:03 by WorkflowHub Bot

Updated to v0.8

Frozen v0.8 bbbad57

v0.7 Created 29th Jun 2024 at 03:02 by WorkflowHub Bot

Updated to v0.7

Frozen v0.7 ddf092f

v0.5 Created 16th Sep 2023 at 03:01 by WorkflowHub Bot

Updated to v0.5

Frozen v0.5 93a3560

v0.4.1 Created 15th Sep 2023 at 03:01 by WorkflowHub Bot

Updated to v0.4.1

Frozen v0.4.1 7acbe8e

v0.4 Created 17th Jan 2023 at 03:01 by WorkflowHub Bot

Updated to v0.4

Frozen v0.4 387b362

v0.3 Created 14th Jan 2023 at 03:01 by WorkflowHub Bot

Updated to v0.3

Frozen v0.3 a0b796b

v0.2 Created 1st Dec 2022 at 03:01 by WorkflowHub Bot

Updated to v0.2

Frozen v0.2 61f6547

v0.1 (earliest) Created 25th Oct 2022 at 03:01 by WorkflowHub Bot

Updated to v0.1

Frozen v0.1 4c67dcd

Creators and Submitter

Creator

Lucille Delisle

Additional credit

Lucille Delisle

Submitter

WorkflowHub Bot

Tools

STAR

MultiQC

Cufflinks

License

MIT License

Activity

Views: 10219 Downloads: 3203 Runs: 1

Created: 25th Oct 2022 at 03:01

Last updated: 17th Jan 2023 at 03:01

rnaseq-pe/main v0.1 (earliest) v1.2 (latest) v1.1 v1.0 v0.9 v0.6 v0.8 v0.7 v0.5 v0.4.1 v0.4 v0.3 v0.2 v0.1 (earliest)