Workflow Type: Galaxy

RNA-seq single-read Workflow

Inputs dataset

  • The workflow needs a list of dataset of fastqsanger.

  • As well as a gtf file with genes

  • Optional, but recommended: a gtf file with regions to exclude from normalization in Cufflinks.

    • For instance a gtf that masks chrM for the mm10 genome:
chrM	chrM_gene	exon	0	16299	.	+	.	gene_id "chrM_gene_plus"; transcript_id "chrM_tx_plus"; exon_id "chrM_ex_plus";
chrM	chrM_gene	exon	0	16299	.	-	.	gene_id "chrM_gene_minus"; transcript_id "chrM_tx_minus"; exon_id "chrM_ex_minus";

Inputs values

  • forward adapter sequence: this depends on the library preparation. Usually classical RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter.
  • reference_genome: this field will be adapted to the genomes available for STAR
  • strandness: For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence. This will help you to get from STAR only the counts corresponding to your library preparation. This is also used for the stranded coverage and for FPKM computation with cufflinks.

Processing

  • The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp
  • The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene.
  • A multiQC is run to have an overview of the QC. This can also be used to get the strandness.
  • FPKM values for reads and transcripts are computed with cufflinks using correction for multi-mapped reads.
  • The BAM is filtered to keep only uniquely mapped reads (tag NH:i:1).
  • Coverage unstranded, and each strand independently is computed with bedtools and normalized to the number of million uniquely mapped reads.
  • The three coverage files are converted to bigwig.

Warning

  • The coverage stranded output depends on the strandness of the library:
    • If you have an unstranded library, stranded coverages are useless
    • If you have a forward stranded library, the label matches the orientation of reads.
    • If you have a reverse stranded library, positive strand coverage should correspond to genes on the forward strand and uses the reads mapped on the reverse strand. negative strand coverage should correspond to genes on the reverse strand and uses the reads mapped on the forward strand.

Contribution

@lldelisle wrote the workflow and the tests.

@nagoue updated the tools, made it work in usegalaxy.org, fixed some best practices.

Inputs

ID Name Description Type
SR fastq input SR fastq input Should be a list of single-read RNA-seq fastqs n/a
forward_adapter forward_adapter Please use: For R1: - For Nextera: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC - For TrueSeq: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC or AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC n/a
reference_genome reference_genome reference_genome n/a
gtf gtf gtf compatible with the reference_genome. Mind the UCSC/Ensembl differences in chromosome naming n/a
strandness strandness For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence n/a
gtf with regions to exclude from FPKM normalization gtf with regions to exclude from FPKM normalization Could be a gtf with for example one entry for the chrM forward and one entry for the chrM reverse n/a

Steps

ID Name Description
0 SR fastq input Should be a list of single-read RNA-seq fastqs
1 forward_adapter Please use: For R1: - For Nextera: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC - For TrueSeq: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC or AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
2 reference_genome reference_genome
3 gtf gtf compatible with the reference_genome. Mind the UCSC/Ensembl differences in chromosome naming
4 strandness For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence
5 gtf with regions to exclude from FPKM normalization Could be a gtf with for example one entry for the chrM forward and one entry for the chrM reverse
6 Cutadapt (remove adapter + bad quality bases) toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/4.0+galaxy1
7 get reference_genome as text parameter toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1
8 awk command from strand toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
9 bedtools orientation for forward coverage toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
10 bedtools orientation for reverse coverage toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
11 Get cufflinks strandess parameter toolshed.g2.bx.psu.edu/repos/iuc/map_param_value/map_param_value/0.1.1
12 STAR: map and count toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy1
13 MultiQC toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.11+galaxy0
14 Extract gene counts toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2
15 Keep only uniquely mapped reads toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.1+galaxy0
16 get scaling factor This step get 1 / millions of uniquely mapped reads toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_awk_tool/1.1.2
17 Compute FPKM toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/cufflinks/2.2.1.3
18 convert dataset to parameter param_value_from_file
19 Scaled Coverage both strands combined toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0
20 Scaled Coverage positive toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0
21 Scaled Coverage negative toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_genomecoveragebed/2.30.0
22 convert both strands coverage to bigwig wig_to_bigWig
23 convert positive coverage to bigwig wig_to_bigWig
24 convert negative coverage to bigwig wig_to_bigWig

Outputs

ID Name Description Type
out1 out1 n/a
report report n/a
out1 out1 n/a
output_param_text output_param_text n/a
output_param_text output_param_text n/a
output_param_text output_param_text n/a
output_param_text output_param_text n/a
output_log output_log n/a
splice_junctions splice_junctions n/a
mapped_reads mapped_reads n/a
reads_per_gene reads_per_gene n/a
stats stats n/a
plots plots n/a
html_report html_report n/a
outfile outfile n/a
out_file2 out_file2 n/a
out_file1 out_file1 n/a
outfile outfile n/a
genes_expression genes_expression n/a
transcripts_expression transcripts_expression n/a
assembled_isoforms assembled_isoforms n/a
total_map_mass total_map_mass n/a
skipped skipped n/a
float_param float_param n/a
output output n/a
output output n/a
output output n/a
out_file1 out_file1 n/a
out_file1 out_file1 n/a
out_file1 out_file1 n/a

Version History

v0.2 (latest) Created 1st Dec 2022 at 03:01 by WorkflowHub Bot

Updated to v0.2


Frozen v0.2 28b9493

v0.1 (earliest) Created 22nd Oct 2022 at 03:01 by WorkflowHub Bot

Updated to v0.1


Frozen v0.1 30a19f3
help Creators and Submitter
Creators
Not specified
Additional credit

Lucille Delisle

Submitter
License
Activity

Views: 92

Created: 22nd Oct 2022 at 03:01

Last updated: 1st Dec 2022 at 03:01

Last used: 7th Dec 2022 at 00:20

help Tags
help Attributions

None

Total size: 65 KB
Powered by
(v.1.12.3)
Copyright © 2008 - 2022 The University of Manchester and HITS gGmbH

By continuing to use this site you agree to the use of cookies