CWL-based ChIP-Seq workflow
Version 1

Workflow Type: Common Workflow Language
Stable

A CWL-based pipeline for processing ChIP-Seq data (FASTQ format) and performing:

  • Peak calling
  • Consensus peak count table generation
  • Detection of super-enhancer regions
  • Differential binding analysis

On the respective GitHub folder are available:

  • The CWL wrappers for the workflow
  • A pre-configured YAML template, based on validation analysis of publicly available HTS data
  • Tables of metadata (EZH2_metadata_CLL.csv and H3K27me3_metadata_CLL.csv), based on the same validation analysis, to serve as input examples for the design of comparisons during differential binding analysis
  • A list of ChIP-Seq blacklisted regions (human genome version 38; hg38) from the ENCODE project, which is can be used as input for the workflow, is provided in BED format (hg38-blacklist.v2.bed)

Briefly, the workflow performs the following steps:

  1. Quality control of short reads (FastQC)
  2. Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trimmomatic)
  3. Mapping to reference genome (HISAT2)
  4. Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
  5. Sorting mapped reads based on chromosomal coordinates (samtools)
  6. Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
  7. Re-sorting based on chromosomal coordinates (samtools)
  8. Removal of duplicate reads (samtools)
  9. Index creation for coordinate-sorted BAM files to enable fast random access (samtools)
  10. Production of quality metrics and files for the inspection of the mapped ChIP-Seq reads, taking into consideration the experimental design (deeptools2):
  • Read coverages for genomic regions of two or more BAM files are computed (multiBamSummary). The results are produced in compressed numpy array (NPZ) format and are used to calculate and visualize pairwise correlation values between the read coverages (plotCorrelation).
  • Estimation of sequencing depth, through genomic position (base pair) sampling, and visualization is performed for multiple BAM files (plotCoverage).
  • Cumulative read coverages for each indexed BAM file are plotted by counting and sorting all reads overlapping a “window” of specified length (plotFingerprint).
  • Production of coverage track files (bigWig), with the coverage calculated as the number of reads per consecutive windows of predefined size (bamCoverage), and normalized through various available methods (e.g., Reads Per Kilobase per Million mapped reads; RPKM). The coverage track files are used to calculate scores per selected genomic regions (computeMatrix), typically genes, and a heatmap, based on the scores associated with these genomic regions, is produced (plotHeatmap).
  1. Calling potential binding positions (peaks) to the genome (peak calling) (MACS2)
  2. Generation of consensus peak count table for the application of custom analyses on MACS2 peak calling results (bedtools)
  3. Detection of super-enhancer regions (Rank Ordering of Super-Enhancers; ROSE)
  4. Differential binding analyses (DiffBind) for:
  • MACS2 peak calling results
  • ROSE-detected super-enhancer regions

Click and drag the diagram to pan, double click or use the controls to zoom.

Inputs

ID Name Description Type
raw_files_directory n/a n/a
  • Directory
input_file_split n/a n/a
  • string?
input_file_split_fwd_single n/a n/a
  • string?
input_file_split_rev n/a n/a
  • string?
input_qc_check n/a n/a
  • boolean?
input_trimming_check n/a n/a
  • boolean?
input_treatment_samples n/a n/a
  • string[]
input_control_samples n/a n/a
  • string[]?
trimmomatic_se_threads n/a n/a
  • int?
trimmomatic_se_illuminaClip n/a n/a
  • string?
trimmomatic_se_slidingWindow n/a n/a
  • string?
trimmomatic_se_leading n/a n/a
  • int?
trimmomatic_se_trailing n/a n/a
  • int?
trimmomatic_se_minlen n/a n/a
  • int?
trimmomatic_pe_threads n/a n/a
  • int?
trimmomatic_pe_illuminaClip n/a n/a
  • string?
trimmomatic_pe_slidingWindow n/a n/a
  • string?
trimmomatic_pe_leading n/a n/a
  • int?
trimmomatic_pe_trailing n/a n/a
  • int?
trimmomatic_pe_minlen n/a n/a
  • int?
hisat2_num_of_threads n/a n/a
  • int?
hisat2_idx_directory n/a n/a
  • Directory
hisat2_idx_basename n/a n/a
  • string
samtools_readswithoutbits n/a n/a
  • int
samtools_view_threads n/a n/a
  • int?
samtools_fixmate_threads n/a n/a
  • int?
samtools_fixmate_output_format n/a n/a
  • string
samtools_sort_compression_level n/a n/a
  • int?
samtools_sort_threads n/a n/a
  • int
samtools_sort_memory n/a n/a
  • string?
samtools_markdup_threads n/a n/a
  • int?
blackListFile n/a n/a
  • File?
multiBamSummary_threads n/a n/a
  • int?
plotCorrelation_numbers n/a n/a
  • boolean?
plotCorrelation_method n/a n/a
  • string?
plotCorrelation_color n/a n/a
  • string?
plotCorrelation_title n/a n/a
  • string?
plotCorrelation_plotType n/a n/a
  • string?
plotCorrelation_outFileName n/a n/a
  • string?
plotCoverage_threads n/a n/a
  • int?
plotCoverage_plotFileFormat n/a n/a
  • string?
plotCoverage_outFileName n/a n/a
  • string?
plotFingerprint_plotFileFormat n/a n/a
  • string?
plotFingerprint_threads n/a n/a
  • int?
plotFingerprint_outFileName n/a n/a
  • string?
bamCoverage_normalizeUsing n/a n/a
  • string?
bamCoverage_effective_genome_size n/a n/a
  • long?
bamCoverage_extendReads n/a n/a
  • int?
bamCoverage_threads n/a n/a
  • int?
computeMatrix_regions n/a n/a
  • File
computeMatrix_threads n/a n/a
  • int?
computeMatrix_upstream n/a n/a
  • int?
computeMatrix_downstream n/a n/a
  • int?
computeMatrix_outputFile n/a n/a
  • string?
computeMatrix_outFileSortedRegions n/a n/a
  • string?
plotHeatmap_plotFileFormat n/a n/a
  • string?
plotHeatmap_outputFile n/a n/a
  • string?
macs2_callpeak_bdg n/a n/a
  • boolean?
macs2_callpeak_gsize n/a n/a
  • string?
macs2_callpeak_format n/a n/a
  • string?
macs2_callpeak_broad n/a n/a
  • boolean?
macs2_callpeak_nomodel n/a n/a
  • boolean?
macs2_shift n/a n/a
  • int?
macs2_extsize n/a n/a
  • int?
macs2_pvalue n/a n/a
  • float?
macs2_qvalue n/a n/a
  • float?
metadata_table n/a n/a
  • File
ChIPQC_blacklist n/a n/a
  • File?
ChIPQC_annotation n/a n/a
  • string?
ChIPQC_consensus n/a n/a
  • boolean?
ChIPQC_bCount n/a n/a
  • boolean?
ChIPQC_facetBy n/a n/a
  • string[]?
DiffBind_consensus n/a n/a
  • string[]?
DiffBind_minOverlap n/a n/a
  • int
  • float
DiffBind_blacklist n/a n/a
  • string
  • boolean
  • File
DiffBind_greylist n/a n/a
  • string
  • boolean
  • File
DiffBind_cores n/a n/a
  • int?
DiffBind_bParallel n/a n/a
  • boolean?
DiffBind_normalization n/a n/a
  • string?
DiffBind_library n/a n/a
  • string?
DiffBind_background n/a n/a
  • boolean?
DiffBind_design n/a n/a
  • string
DiffBind_reorderMeta_factor n/a n/a
  • string[]?
DiffBind_reorderMeta_value n/a n/a
  • string[]?
DiffBind_retrieve_consensus n/a n/a
  • boolean?
DiffBind_low_read_count_filter n/a n/a
  • int?
DiffBind_filterFun n/a n/a
  • string?
rose_genome_build n/a n/a
  • string
rose_stitch_distance n/a n/a
  • int?
rose_tss_distance n/a n/a
  • int?

Steps

ID Name Description
get_raw_files n/a n/a
split_single_paired n/a n/a
trimmomatic_single_end n/a n/a
trimmomatic_paired_end n/a n/a
fastqc_raw n/a n/a
fastqc_single_trimmed n/a n/a
fastqc_paired_trimmed_fwd n/a n/a
fastqc_paired_trimmed_rev n/a n/a
cp_fastqc_raw_zip n/a n/a
cp_fastqc_single_zip n/a n/a
cp_fastqc_paired_zip n/a n/a
rename_fastqc_raw_html n/a n/a
rename_fastqc_single_html n/a n/a
rename_fastqc_paired_html_fwd n/a n/a
rename_fastqc_paired_html_rev n/a n/a
check_trimming n/a n/a
hisat2_for_single_reads n/a n/a
hisat2_for_paired_reads n/a n/a
collect_hisat2_sam_files n/a n/a
samtools_view n/a n/a
samtools_sort_by_name n/a n/a
samtools_fixmate n/a n/a
samtools_sort n/a n/a
samtools_markdup n/a n/a
samtools_index n/a n/a
multiBamSummary_file n/a n/a
plotCorrelation_file n/a n/a
plotCoverage_file n/a n/a
plotFingerprint_file n/a n/a
bamCoverage_norm n/a n/a
computeMatrix n/a n/a
plotHeatmap n/a n/a
separate_control_treatment_files n/a n/a
macs2_call_peaks n/a n/a
total_peaks_table n/a n/a
sort_peaks_table n/a n/a
bedtools_merge n/a n/a
exclude_black_list_regions n/a n/a
bedtools_coverage n/a n/a
extract_counts n/a n/a
extract_peaks n/a n/a
printf_header_samples n/a n/a
paste_content_1 n/a n/a
paste_content_2 n/a n/a
append_files n/a n/a
ChIPQC_macs n/a n/a
DiffBind_macs n/a n/a
exclude_black_list_regions_narrowPeak n/a n/a
bed_to_rose_gff_conversion n/a n/a
rose_main n/a n/a
enhancer_bed_processing n/a n/a
ChIPQC_rose n/a n/a
DiffBind_rose n/a n/a

Outputs

ID Name Description Type
o_trimmomatic_single_end_stderr n/a n/a
  • File[]
o_trimmomatic_single_end_fastq n/a n/a
  • File[]
o_trimmomatic_paired_end_stderr n/a n/a
  • File[]
o_trimmomatic_paired_end_fwd_paired n/a n/a
  • File[]
o_trimmomatic_paired_end_fwd_unpaired n/a n/a
  • File[]
o_trimmomatic_paired_end_rev_paired n/a n/a
  • File[]
o_trimmomatic_paired_end_rev_unpaired n/a n/a
  • File[]
o_fastqc_raw_html n/a n/a
  • File[]?
o_fastqc_single_html n/a n/a
  • File[]?
o_fastqc_paired_html_fwd n/a n/a
  • File[]?
o_fastqc_paired_html_rev n/a n/a
  • File[]?
o_fastqc_raw_zip n/a n/a
  • Directory?
o_fastqc_single_zip n/a n/a
  • Directory?
o_fastqc_paired_zip n/a n/a
  • Directory?
o_hisat2_for_single_reads_sam n/a n/a
  • File[]
o_hisat2_for_single_reads_stderr n/a n/a
  • File[]
o_hisat2_for_paired_reads_sam n/a n/a
  • File[]
o_hisat2_for_paired_reads_stderr n/a n/a
  • File[]
o_samtools_sort_by_name n/a n/a
  • File[]
o_samtools_fixmate n/a n/a
  • File[]
o_samtools_sort n/a n/a
  • File[]
o_samtools_markdup n/a n/a
  • File[]
o_samtools_index n/a n/a
  • File[]
o_multiBamSummary_file n/a n/a
  • File
o_plotCorrelation_file n/a n/a
  • File
o_plotCoverage_file n/a n/a
  • File
o_plotFingerprint_file n/a n/a
  • File
o_bamCoverage_norm n/a n/a
  • File[]
o_computeMatrix_matrix n/a n/a
  • File
o_computeMatrix_regions n/a n/a
  • File
o_plotHeatmap n/a n/a
  • File
o_macs2_call_peaks_narrowPeak n/a n/a
  • File[]?
o_macs2_call_peaks_xls n/a n/a
  • File[]?
o_macs2_call_peaks_bed n/a n/a
  • File[]?
o_macs2_call_peaks_lambda n/a n/a
  • File[]?
o_macs2_call_peaks_pileup n/a n/a
  • File[]?
o_macs2_call_peaks_broadPeak n/a n/a
  • File[]?
o_macs2_call_peaks_gappedPeak n/a n/a
  • File[]?
o_macs2_call_peaks_model_r n/a n/a
  • File[]?
o_macs2_call_peaks_cutoff n/a n/a
  • File[]?
o_total_peaks_table n/a n/a
  • File
o_sort_peaks_table n/a n/a
  • File
o_bedtools_merge n/a n/a
  • File
o_bedtools_intersect n/a n/a
  • File
o_exclude_black_list_regions n/a n/a
  • File
o_bedtools_coverage n/a n/a
  • File[]
o_printf_header_samples n/a n/a
  • File
o_paste_content_1 n/a n/a
  • File
o_paste_content_2 n/a n/a
  • File
o_append_files n/a n/a
  • File
o_ChIPQC_macs_ChIPQCexperiment n/a n/a
  • File
o_ChIPQC_macs_outdir n/a n/a
  • Directory
o_ChIPQC_macs_ChIPQCreport n/a n/a
  • File?
o_DiffBind_macs_diffbind_results n/a n/a
  • File
o_DiffBind_macs_correlation_heatmap n/a n/a
  • File
o_DiffBind_macs_diffbind_consensus n/a n/a
  • File?
o_DiffBind_macs_diffbind_normalized_counts n/a n/a
  • File?
o_DiffBind_macs_diffbind_dba_object n/a n/a
  • File?
o_exclude_black_list_regions_narrowPeak n/a n/a
  • File[]
o_bed_to_rose_gff_conversion n/a n/a
  • File[]
o_rose_main_gff_dir_outputs n/a n/a
  • array containing
    • array containing
      • File
o_rose_main_mappedGFF_dir_outputs n/a n/a
  • array containing
    • array containing
      • File
o_rose_main_STITCHED_ENHANCER_REGION_MAP n/a n/a
  • File[]?
o_rose_main_AllEnhancers_table n/a n/a
  • File[]?
o_rose_main_SuperEnhancers_table n/a n/a
  • File[]?
o_rose_main_Plot_points n/a n/a
  • File[]?
o_rose_main_Enhancers_withSuper n/a n/a
  • File[]?
o_enhancer_bed_processing n/a n/a
  • File[]?
o_ChIPQC_rose_ChIPQCexperiment n/a n/a
  • File
o_ChIPQC_rose_outdir n/a n/a
  • Directory?
o_ChIPQC_rose_ChIPQCreport n/a n/a
  • File
o_DiffBind_rose_diffbind_results n/a n/a
  • File
o_DiffBind_rose_correlation_heatmap n/a n/a
  • File
o_DiffBind_rose_diffbind_consensus n/a n/a
  • File?
o_DiffBind_rose_diffbind_normalized_counts n/a n/a
  • File?
o_DiffBind_rose_diffbind_dba_object n/a n/a
  • File?

Version History

Version 1 (earliest) Created 5th Jul 2023 at 10:39 by Konstantinos Kyritsis

Initial commit


Frozen Version-1 c05e175
help Creators and Submitter
Creators
Submitter
Citation
Kyritsis, K., Pechlivanis, N., & Psomopoulos, F. (2023). CWL-based ChIP-Seq workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.525.1
License
Activity

Views: 720

Created: 5th Jul 2023 at 10:39

Annotated Properties
Topic annotations
help Attributions

None

Total size: 41.3 KB
Powered by
(v.1.14.1)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH