This workflow does not specify a "main" workflow file.
Workflow Type: Galaxy
RNA-seq paired-end Workflow
Inputs dataset
-
The workflow needs a list of dataset pairs of fastqsanger.
-
As well as a gtf file with genes
-
Optional, but recommended: a gtf file with regions to exclude from normalization in Cufflinks.
- For instance a gtf that masks chrM for the mm10 genome:
chrM chrM_gene exon 0 16299 . + . gene_id "chrM_gene_plus"; transcript_id "chrM_tx_plus"; exon_id "chrM_ex_plus";
chrM chrM_gene exon 0 16299 . - . gene_id "chrM_gene_minus"; transcript_id "chrM_tx_minus"; exon_id "chrM_ex_minus";
Inputs values
- adapter sequences: this depends on the library preparation. Usually classical RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter.
- reference_genome: this field will be adapted to the genomes available for STAR
- strandness: For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence. This will help you to get from STAR only the counts corresponding to your library preparation. This is also used for the stranded coverage and for FPKM computation with cufflinks.
Processing
- The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp
- The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene.
- A multiQC is run to have an overview of the QC. This can also be used to get the strandness.
- FPKM values for reads and transcripts are computed with cufflinks using correction for multi-mapped reads.
- The BAM is filtered to keep only uniquely mapped reads (tag NH:i:1).
- Coverage unstranded, and each strand independently is computed with bedtools and normalized to the number of million uniquely mapped reads (in order to compute stranded coverage the BAM is modified so second mate in pairs matches orientation of the first mate in pairs).
- The three coverage files are converted to bigwig.
Warning
- The coverage stranded output depends on the strandness of the library:
- If you have an unstranded library, stranded coverages are useless
- If you have a forward stranded library, the label matches the orientation of the first read in pairs.
- If you have a reverse stranded library, the label matches the orientation of the second read in pairs.
Inputs
ID | Name | Description | Type |
---|---|---|---|
PE fastq input | PE fastq input | Should be a list of paired-end RNA-seq fastqs |
|
forward_adapter | forward_adapter | Please use: For R1: - For Nextera: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC - For TrueSeq: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC or AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC |
|
gtf | gtf | gtf compatible with the reference_genome. Mind the UCSC/Ensembl differences in chromosome naming |
|
gtf with regions to exclude from FPKM normalization | gtf with regions to exclude from FPKM normalization | Could be a gtf with for example one entry for the chrM forward and one entry for the chrM reverse |
|
reference_genome | reference_genome | reference_genome |
|
reverse_adapter | reverse_adapter | Please use: For R2: - For Nextera: CTGTCTCTTATACACATCTGACGCTGCCGACGA - For TruSeq: GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT or AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT |
|
strandness | strandness | For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence |
|
Steps
ID | Name | Description |
---|---|---|
7 | Cutadapt (remove adapter + bad quality bases) | toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.0+galaxy1 |
8 | get reference_genome as text parameter | toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1 |
9 | awk command from strand | toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1 |
10 | bedtools orientation for forward coverage | toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1 |
11 | bedtools orientation for reverse coverage | toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1 |
12 | Get cufflinks strandess parameter | toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1 |
13 | STAR: map and count | toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0 |
14 | MultiQC | toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0 |
15 | Extract gene counts | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2 |
16 | Keep only uniquely mapped reads | toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.1+galaxy0 |
17 | get scaling factor | This step get 1 / millions of uniquely mapped reads toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2 |
18 | Compute FPKM | toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/cufflinks/2.2.1.3 |
19 | revertR2orientationInBam | toolshed.g2.bx.psu.edu/repos/lldelisle/revertr2orientationinbam/revertR2orientationInBam/0.0.2 |
20 | convert dataset to parameter | param_value_from_file |
21 | Scaled Coverage both strands combined | toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0 |
22 | Scaled Coverage positive | toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0 |
23 | Scaled Coverage negative | toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0 |
24 | convert both strands coverage to bigwig | wig_to_bigWig |
25 | convert positive coverage to bigwig | wig_to_bigWig |
26 | convert negative coverage to bigwig | wig_to_bigWig |
Outputs
ID | Name | Description | Type |
---|---|---|---|
mapped-reads | mapped-reads | n/a |
|
output_log | output_log | n/a |
|
reads_per_gene from STAR | reads_per_gene from STAR | n/a |
|
MultiQC webpage | MultiQC webpage | n/a |
|
MultiQC on input dataset(s): Stats | MultiQC on input dataset(s): Stats | n/a |
|
HTS count like output | HTS count like output | n/a |
|
genes_expression | genes_expression | n/a |
|
transcripts_expression | transcripts_expression | n/a |
|
both strands coverage | both strands coverage | n/a |
|
positive strand coverage | positive strand coverage | n/a |
|
negative strand coverage | negative strand coverage | n/a |
|
Version History
v0.1 (earliest) Created 25th Oct 2022 at 03:01 by WorkflowHub Bot
Updated to v0.1
Frozen
v0.1
4c67dcd
Creators and Submitter
Creator
Additional credit
Lucille Delisle
Submitter
License
Activity
Views: 5725 Downloads: 1372 Runs: 1
Created: 25th Oct 2022 at 03:01
Last updated: 17th Jan 2023 at 03:01
Tags
Attributions
None