CWL-based RNA-Seq workflow
Version 1

Workflow Type: Common Workflow Language
Stable

A CWL-based pipeline for processing RNA-Seq data (FASTQ format) and performing differential gene/transcript expression analysis.

On the respective GitHub folder are available:

  • The CWL wrappers for the workflow
  • A pre-configured YAML template, based on validation analysis of publicly available HTS data
  • A table of metadata (mrna_cll_subsets_phenotypes.csv), based on the same validation analysis, to serve as an input example for the design of comparisons during differential expression analysis

Briefly, the workflow performs the following steps:

  1. Quality control of Illumina reads (FastQC)
  2. Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trim galore)
  3. (Optional) custom processing of the reads using FASTA/Q Trimmer (part of the FASTX-toolkit)
  4. Mapping to reference genome (HISAT2)
  5. Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
  6. Sorting mapped reads based on chromosomal coordinates (samtools)

Subsequently, two independent workflows are implemented for differential expression analysis at the transcript and gene level.

First, following the reference protocol for HISAT, StringTie and Ballgown transcript expression analysis, StringTie along with a reference transcript annotation GTF (Gene Transfer Format) file (if one is available) is used to:

  • Assemble transcripts for each RNA-Seq sample using the previous read alignments (BAM files)
  • Generate a global, non-redundant set of transcripts observed in any of the RNA-Seq samples
  • Estimate transcript abundances and generate read coverage tables for each RNA-Seq sample, based on the global, merged set of transcripts (rather than the reference) which is observed across all samples

Ballgown program is then used to load the coverage tables generated in the previous step and perform statistical analyses for differential expression at the transcript level. Notably, the StringTie - Ballgown protocol applied here was selected to include potentially novel transcripts in the analysis.

Second, featureCounts is used to count reads that are mapped to selected genomic features, in this case genes by default, and generate a table of read counts per gene and sample. This table is passed as input to DESeq2 to perform differential expression analysis at the gene level. Both Ballgown and DESeq2 R scripts, along with their respective CWL wrappers, were designed to receive as input various parameters, such as experimental design, contrasts of interest, numeric thresholds, and hidden batch effects.

Inputs

ID Name Description Type
raw_files_directory n/a n/a
  • Directory
input_file_split n/a n/a
  • string?
input_file_split_fwd_single n/a n/a
  • string?
input_file_split_rev n/a n/a
  • string?
input_qc_check n/a n/a
  • boolean?
input_trimming_check n/a n/a
  • boolean?
premapping_input_check n/a n/a
  • string
tg_quality n/a n/a
  • int
tg_length n/a n/a
  • int
tg_compression n/a n/a
  • boolean
tg_do_not_compress n/a n/a
  • boolean
tg_trim_suffix n/a n/a
  • string
tg_strigency n/a n/a
  • int
fastx_first_base_to_keep n/a n/a
  • int?
fastx_last_base_to_keep n/a n/a
  • int?
hisat2_num_of_threads n/a n/a
  • int
hisat2_alignments_tailored_trans_assemb n/a n/a
  • boolean
hisat2_idx_directory n/a n/a
  • Directory
hisat2_idx_basename n/a n/a
  • string
hisat2_known_splicesite_infile n/a n/a
  • File?
samtools_view_isbam n/a n/a
  • boolean
samtools_view_collapsecigar n/a n/a
  • boolean
samtools_view_uncompressed n/a n/a
  • boolean
samtools_view_fastcompression n/a n/a
  • boolean
samtools_view_samheader n/a n/a
  • boolean
samtools_view_count n/a n/a
  • boolean
samtools_view_readswithoutbits n/a n/a
  • int?
samtools_view_readsingroup n/a n/a
  • string?
samtools_view_readtagtostrip n/a n/a
  • string[]?
samtools_view_readsquality n/a n/a
  • int?
samtools_view_readswithbits n/a n/a
  • int?
samtools_view_cigar n/a n/a
  • int?
samtools_view_iscram n/a n/a
  • boolean
samtools_view_threads n/a n/a
  • int?
samtools_view_randomseed n/a n/a
  • float?
samtools_view_region n/a n/a
  • string?
samtools_view_readsinlibrary n/a n/a
  • string?
samtools_sort_compression_level n/a n/a
  • int?
samtools_sort_threads n/a n/a
  • int?
samtools_sort_memory n/a n/a
  • string?
samtools_sort_sort_by_name n/a n/a
  • boolean?
stringtie_guide_gff n/a n/a
  • File
stringtie_transcript_merge_mode n/a n/a
  • boolean
stringtie_out_gtf n/a n/a
  • string
stringtie_expression_estimation_mode n/a n/a
  • boolean
stringtie_ballgown_table_files n/a n/a
  • boolean
stringtie_cpus n/a n/a
  • int?
stringtie_verbose n/a n/a
  • boolean?
stringtie_min_isoform_abundance n/a n/a
  • float?
stringtie_junction_coverage n/a n/a
  • float?
stringtie_min_read_coverage n/a n/a
  • float?
stringtie_conservative_mode n/a n/a
  • boolean?
bg_phenotype_file n/a n/a
  • File
bg_phenotype n/a n/a
  • string
bg_samples n/a n/a
  • string
bg_timecourse n/a n/a
  • boolean?
bg_feature n/a n/a
  • string?
bg_measure n/a n/a
  • string?
bg_confounders n/a n/a
  • string?
bg_custom_model n/a n/a
  • boolean?
bg_mod n/a n/a
  • string?
bg_mod0 n/a n/a
  • string?
featureCounts_number_of_threads n/a n/a
  • int?
featureCounts_annotation_file n/a n/a
  • File
featureCounts_output_file n/a n/a
  • string
featureCounts_read_meta_feature_overlap n/a n/a
  • boolean?
deseq2_metadata n/a n/a
  • File
deseq2_design n/a n/a
  • string
deseq2_samples n/a n/a
  • string
deseq2_min_sum_of_reads n/a n/a
  • int?
deseq2_reference_level n/a n/a
  • string?
deseq2_phenotype n/a n/a
  • string?
deseq2_contrast n/a n/a
  • boolean?
deseq2_numerator n/a n/a
  • string?
deseq2_denominator n/a n/a
  • string?
deseq2_lfcThreshold n/a n/a
  • float?
deseq2_pAdjustMethod n/a n/a
  • string?
deseq2_alpha n/a n/a
  • float?
deseq2_parallelization n/a n/a
  • boolean?
deseq2_cores n/a n/a
  • int?
deseq2_transformation n/a n/a
  • string?
deseq2_blind n/a n/a
  • boolean?
deseq2_hypothesis n/a n/a
  • string?
deseq2_reduced n/a n/a
  • string?
deseq2_hidden_batch_effects n/a n/a
  • boolean?
deseq2_hidden_batch_row_means n/a n/a
  • int?
deseq2_hidden_batch_method n/a n/a
  • string?
deseq2_variables n/a n/a
  • int?

Steps

ID Name Description
get_raw_files n/a n/a
split_single_paired n/a n/a
trim_galore_single n/a n/a
trim_galore_paired n/a n/a
fastqc_raw n/a n/a
fastqc_single_trimmed n/a n/a
fastqc_paired_trimmed n/a n/a
cp_fastqc_raw_zip n/a n/a
cp_fastqc_single_zip n/a n/a
cp_fastqc_paired_zip n/a n/a
rename_fastqc_raw_html n/a n/a
rename_fastqc_single_html n/a n/a
rename_fastqc_paired_html n/a n/a
fastx_trimmer_single n/a n/a
fastx_trimmer_paired n/a n/a
check_for_fastx_and_produce_names n/a n/a
hisat2_for_single_reads n/a n/a
hisat2_for_paired_reads n/a n/a
collect_hisat2_sam_files n/a n/a
samtools_view n/a n/a
samtools_sort n/a n/a
stringtie_transcript_assembly n/a n/a
stringtie_merge n/a n/a
stringtie_expression n/a n/a
ballgown_de n/a n/a
featureCounts n/a n/a
DESeq2_analysis n/a n/a

Outputs

ID Name Description Type
o_trim_galore_single_fq n/a n/a
  • File[]
o_trim_galore_single_reports n/a n/a
  • File[]
o_trim_galore_paired_fq n/a n/a
  • File[]
o_trim_galore_paired_reports n/a n/a
  • File[]
o_fastqc_raw_html n/a n/a
  • File[]?
o_fastqc_single_html n/a n/a
  • File[]?
o_fastqc_paired_html n/a n/a
  • File[]?
o_fastqc_raw_zip n/a n/a
  • Directory?
o_fastqc_single_zip n/a n/a
  • Directory?
o_fastqc_paired_zip n/a n/a
  • Directory?
o_fastx_trimmer_single n/a n/a
  • File[]
o_fastx_trimmer_paired n/a n/a
  • File[]
o_hisat2_for_single_reads_reports n/a n/a
  • File[]
o_hisat2_for_paired_reads_reports n/a n/a
  • File[]
o_collect_hisat2_sam_files n/a n/a
  • File[]
o_samtools_view n/a n/a
  • File[]
o_samtools_sort n/a n/a
  • File[]
o_stringtie_transcript_assembly_gtf n/a n/a
  • File[]
o_stringtie_merge n/a n/a
  • File
o_stringtie_expression_gtf n/a n/a
  • File[]
o_stringtie_expression_outdir n/a n/a
  • Directory[]
o_ballgown_de_results n/a n/a
  • File
o_ballgown_object n/a n/a
  • File
o_ballgown_de_custom_model n/a n/a
  • File?
o_featureCounts n/a n/a
  • File
o_deseq2_de_results n/a n/a
  • File
o_deseq2_dds_object n/a n/a
  • File
o_deseq2_res_lfcShrink_object n/a n/a
  • File
o_deseq2_transformed_object n/a n/a
  • File?

Version History

Version 1 (earliest) Created 5th Jul 2023 at 09:44 by Konstantinos Kyritsis

Initial commit


Frozen Version-1 a80a6c7
help Creators and Submitter
Creators
  • Konstantinos Kyritsis
  • Nikolaos Pechlivanis
  • Fotis Psomopoulos
Submitter
Citation
Kyritsis, K., Pechlivanis, N., & Psomopoulos, F. (2023). CWL-based RNA-Seq workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.524.1
License
Activity

Views: 160

Created: 5th Jul 2023 at 09:44

Last updated: 5th Jul 2023 at 10:15

Annotated Properties
Topic annotations
Operation annotations
help Attributions

None

Total size: 23.6 KB

Brought to you by:

Powered by
(v.1.14.0-pre)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH