Workflow Type: Common Workflow Language
Frozen
Stable
A CWL-based pipeline for processing ChIP-Seq data (FASTQ format) and performing:
- Peak calling
- Consensus peak count table generation
- Detection of super-enhancer regions
- Differential binding analysis
On the respective GitHub folder are available:
- The CWL wrappers for the workflow
- A pre-configured YAML template, based on validation analysis of publicly available HTS data
- Tables of metadata (
EZH2_metadata_CLL.csv
andH3K27me3_metadata_CLL.csv
), based on the same validation analysis, to serve as input examples for the design of comparisons during differential binding analysis - A list of ChIP-Seq blacklisted regions (human genome version 38; hg38) from the ENCODE project, which is can be used as input for the workflow, is provided in BED format (
hg38-blacklist.v2.bed
)
Briefly, the workflow performs the following steps:
- Quality control of short reads (FastQC)
- Trimming of the reads (e.g., removal of adapter and/or low quality sequences) (Trimmomatic)
- Mapping to reference genome (HISAT2)
- Convertion of mapped reads from SAM (Sequence Alignment Map) to BAM (Binary Alignment Map) format (samtools)
- Sorting mapped reads based on chromosomal coordinates (samtools)
- Adding information regarding paired end reads (e.g., CIGAR field information) (samtools)
- Re-sorting based on chromosomal coordinates (samtools)
- Removal of duplicate reads (samtools)
- Index creation for coordinate-sorted BAM files to enable fast random access (samtools)
- Production of quality metrics and files for the inspection of the mapped ChIP-Seq reads, taking into consideration the experimental design (deeptools2):
- Read coverages for genomic regions of two or more BAM files are computed (multiBamSummary). The results are produced in compressed numpy array (NPZ) format and are used to calculate and visualize pairwise correlation values between the read coverages (plotCorrelation).
- Estimation of sequencing depth, through genomic position (base pair) sampling, and visualization is performed for multiple BAM files (plotCoverage).
- Cumulative read coverages for each indexed BAM file are plotted by counting and sorting all reads overlapping a “window” of specified length (plotFingerprint).
- Production of coverage track files (bigWig), with the coverage calculated as the number of reads per consecutive windows of predefined size (bamCoverage), and normalized through various available methods (e.g., Reads Per Kilobase per Million mapped reads; RPKM). The coverage track files are used to calculate scores per selected genomic regions (computeMatrix), typically genes, and a heatmap, based on the scores associated with these genomic regions, is produced (plotHeatmap).
- Calling potential binding positions (peaks) to the genome (peak calling) (MACS2)
- Generation of consensus peak count table for the application of custom analyses on MACS2 peak calling results (bedtools)
- Detection of super-enhancer regions (Rank Ordering of Super-Enhancers; ROSE)
- Differential binding analyses (DiffBind) for:
- MACS2 peak calling results
- ROSE-detected super-enhancer regions
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
raw_files_directory | n/a | n/a |
|
input_file_split | n/a | n/a |
|
input_file_split_fwd_single | n/a | n/a |
|
input_file_split_rev | n/a | n/a |
|
input_qc_check | n/a | n/a |
|
input_trimming_check | n/a | n/a |
|
input_treatment_samples | n/a | n/a |
|
input_control_samples | n/a | n/a |
|
trimmomatic_se_threads | n/a | n/a |
|
trimmomatic_se_illuminaClip | n/a | n/a |
|
trimmomatic_se_slidingWindow | n/a | n/a |
|
trimmomatic_se_leading | n/a | n/a |
|
trimmomatic_se_trailing | n/a | n/a |
|
trimmomatic_se_minlen | n/a | n/a |
|
trimmomatic_pe_threads | n/a | n/a |
|
trimmomatic_pe_illuminaClip | n/a | n/a |
|
trimmomatic_pe_slidingWindow | n/a | n/a |
|
trimmomatic_pe_leading | n/a | n/a |
|
trimmomatic_pe_trailing | n/a | n/a |
|
trimmomatic_pe_minlen | n/a | n/a |
|
hisat2_num_of_threads | n/a | n/a |
|
hisat2_idx_directory | n/a | n/a |
|
hisat2_idx_basename | n/a | n/a |
|
samtools_readswithoutbits | n/a | n/a |
|
samtools_view_threads | n/a | n/a |
|
samtools_fixmate_threads | n/a | n/a |
|
samtools_fixmate_output_format | n/a | n/a |
|
samtools_sort_compression_level | n/a | n/a |
|
samtools_sort_threads | n/a | n/a |
|
samtools_sort_memory | n/a | n/a |
|
samtools_markdup_threads | n/a | n/a |
|
blackListFile | n/a | n/a |
|
multiBamSummary_threads | n/a | n/a |
|
plotCorrelation_numbers | n/a | n/a |
|
plotCorrelation_method | n/a | n/a |
|
plotCorrelation_color | n/a | n/a |
|
plotCorrelation_title | n/a | n/a |
|
plotCorrelation_plotType | n/a | n/a |
|
plotCorrelation_outFileName | n/a | n/a |
|
plotCoverage_threads | n/a | n/a |
|
plotCoverage_plotFileFormat | n/a | n/a |
|
plotCoverage_outFileName | n/a | n/a |
|
plotFingerprint_plotFileFormat | n/a | n/a |
|
plotFingerprint_threads | n/a | n/a |
|
plotFingerprint_outFileName | n/a | n/a |
|
bamCoverage_normalizeUsing | n/a | n/a |
|
bamCoverage_effective_genome_size | n/a | n/a |
|
bamCoverage_extendReads | n/a | n/a |
|
bamCoverage_threads | n/a | n/a |
|
computeMatrix_regions | n/a | n/a |
|
computeMatrix_threads | n/a | n/a |
|
computeMatrix_upstream | n/a | n/a |
|
computeMatrix_downstream | n/a | n/a |
|
computeMatrix_outputFile | n/a | n/a |
|
computeMatrix_outFileSortedRegions | n/a | n/a |
|
plotHeatmap_plotFileFormat | n/a | n/a |
|
plotHeatmap_outputFile | n/a | n/a |
|
macs2_callpeak_bdg | n/a | n/a |
|
macs2_callpeak_gsize | n/a | n/a |
|
macs2_callpeak_format | n/a | n/a |
|
macs2_callpeak_broad | n/a | n/a |
|
macs2_callpeak_nomodel | n/a | n/a |
|
macs2_shift | n/a | n/a |
|
macs2_extsize | n/a | n/a |
|
macs2_pvalue | n/a | n/a |
|
macs2_qvalue | n/a | n/a |
|
metadata_table | n/a | n/a |
|
ChIPQC_blacklist | n/a | n/a |
|
ChIPQC_annotation | n/a | n/a |
|
ChIPQC_consensus | n/a | n/a |
|
ChIPQC_bCount | n/a | n/a |
|
ChIPQC_facetBy | n/a | n/a |
|
DiffBind_consensus | n/a | n/a |
|
DiffBind_minOverlap | n/a | n/a |
|
DiffBind_blacklist | n/a | n/a |
|
DiffBind_greylist | n/a | n/a |
|
DiffBind_cores | n/a | n/a |
|
DiffBind_bParallel | n/a | n/a |
|
DiffBind_normalization | n/a | n/a |
|
DiffBind_library | n/a | n/a |
|
DiffBind_background | n/a | n/a |
|
DiffBind_design | n/a | n/a |
|
DiffBind_reorderMeta_factor | n/a | n/a |
|
DiffBind_reorderMeta_value | n/a | n/a |
|
DiffBind_retrieve_consensus | n/a | n/a |
|
DiffBind_low_read_count_filter | n/a | n/a |
|
DiffBind_filterFun | n/a | n/a |
|
rose_genome_build | n/a | n/a |
|
rose_stitch_distance | n/a | n/a |
|
rose_tss_distance | n/a | n/a |
|
Steps
ID | Name | Description |
---|---|---|
get_raw_files | n/a | n/a |
split_single_paired | n/a | n/a |
trimmomatic_single_end | n/a | n/a |
trimmomatic_paired_end | n/a | n/a |
fastqc_raw | n/a | n/a |
fastqc_single_trimmed | n/a | n/a |
fastqc_paired_trimmed_fwd | n/a | n/a |
fastqc_paired_trimmed_rev | n/a | n/a |
cp_fastqc_raw_zip | n/a | n/a |
cp_fastqc_single_zip | n/a | n/a |
cp_fastqc_paired_zip | n/a | n/a |
rename_fastqc_raw_html | n/a | n/a |
rename_fastqc_single_html | n/a | n/a |
rename_fastqc_paired_html_fwd | n/a | n/a |
rename_fastqc_paired_html_rev | n/a | n/a |
check_trimming | n/a | n/a |
hisat2_for_single_reads | n/a | n/a |
hisat2_for_paired_reads | n/a | n/a |
collect_hisat2_sam_files | n/a | n/a |
samtools_view | n/a | n/a |
samtools_sort_by_name | n/a | n/a |
samtools_fixmate | n/a | n/a |
samtools_sort | n/a | n/a |
samtools_markdup | n/a | n/a |
samtools_index | n/a | n/a |
multiBamSummary_file | n/a | n/a |
plotCorrelation_file | n/a | n/a |
plotCoverage_file | n/a | n/a |
plotFingerprint_file | n/a | n/a |
bamCoverage_norm | n/a | n/a |
computeMatrix | n/a | n/a |
plotHeatmap | n/a | n/a |
separate_control_treatment_files | n/a | n/a |
macs2_call_peaks | n/a | n/a |
total_peaks_table | n/a | n/a |
sort_peaks_table | n/a | n/a |
bedtools_merge | n/a | n/a |
exclude_black_list_regions | n/a | n/a |
bedtools_coverage | n/a | n/a |
extract_counts | n/a | n/a |
extract_peaks | n/a | n/a |
printf_header_samples | n/a | n/a |
paste_content_1 | n/a | n/a |
paste_content_2 | n/a | n/a |
append_files | n/a | n/a |
ChIPQC_macs | n/a | n/a |
DiffBind_macs | n/a | n/a |
exclude_black_list_regions_narrowPeak | n/a | n/a |
bed_to_rose_gff_conversion | n/a | n/a |
rose_main | n/a | n/a |
enhancer_bed_processing | n/a | n/a |
ChIPQC_rose | n/a | n/a |
DiffBind_rose | n/a | n/a |
Outputs
ID | Name | Description | Type |
---|---|---|---|
o_trimmomatic_single_end_stderr | n/a | n/a |
|
o_trimmomatic_single_end_fastq | n/a | n/a |
|
o_trimmomatic_paired_end_stderr | n/a | n/a |
|
o_trimmomatic_paired_end_fwd_paired | n/a | n/a |
|
o_trimmomatic_paired_end_fwd_unpaired | n/a | n/a |
|
o_trimmomatic_paired_end_rev_paired | n/a | n/a |
|
o_trimmomatic_paired_end_rev_unpaired | n/a | n/a |
|
o_fastqc_raw_html | n/a | n/a |
|
o_fastqc_single_html | n/a | n/a |
|
o_fastqc_paired_html_fwd | n/a | n/a |
|
o_fastqc_paired_html_rev | n/a | n/a |
|
o_fastqc_raw_zip | n/a | n/a |
|
o_fastqc_single_zip | n/a | n/a |
|
o_fastqc_paired_zip | n/a | n/a |
|
o_hisat2_for_single_reads_sam | n/a | n/a |
|
o_hisat2_for_single_reads_stderr | n/a | n/a |
|
o_hisat2_for_paired_reads_sam | n/a | n/a |
|
o_hisat2_for_paired_reads_stderr | n/a | n/a |
|
o_samtools_sort_by_name | n/a | n/a |
|
o_samtools_fixmate | n/a | n/a |
|
o_samtools_sort | n/a | n/a |
|
o_samtools_markdup | n/a | n/a |
|
o_samtools_index | n/a | n/a |
|
o_multiBamSummary_file | n/a | n/a |
|
o_plotCorrelation_file | n/a | n/a |
|
o_plotCoverage_file | n/a | n/a |
|
o_plotFingerprint_file | n/a | n/a |
|
o_bamCoverage_norm | n/a | n/a |
|
o_computeMatrix_matrix | n/a | n/a |
|
o_computeMatrix_regions | n/a | n/a |
|
o_plotHeatmap | n/a | n/a |
|
o_macs2_call_peaks_narrowPeak | n/a | n/a |
|
o_macs2_call_peaks_xls | n/a | n/a |
|
o_macs2_call_peaks_bed | n/a | n/a |
|
o_macs2_call_peaks_lambda | n/a | n/a |
|
o_macs2_call_peaks_pileup | n/a | n/a |
|
o_macs2_call_peaks_broadPeak | n/a | n/a |
|
o_macs2_call_peaks_gappedPeak | n/a | n/a |
|
o_macs2_call_peaks_model_r | n/a | n/a |
|
o_macs2_call_peaks_cutoff | n/a | n/a |
|
o_total_peaks_table | n/a | n/a |
|
o_sort_peaks_table | n/a | n/a |
|
o_bedtools_merge | n/a | n/a |
|
o_bedtools_intersect | n/a | n/a |
|
o_exclude_black_list_regions | n/a | n/a |
|
o_bedtools_coverage | n/a | n/a |
|
o_printf_header_samples | n/a | n/a |
|
o_paste_content_1 | n/a | n/a |
|
o_paste_content_2 | n/a | n/a |
|
o_append_files | n/a | n/a |
|
o_ChIPQC_macs_ChIPQCexperiment | n/a | n/a |
|
o_ChIPQC_macs_outdir | n/a | n/a |
|
o_ChIPQC_macs_ChIPQCreport | n/a | n/a |
|
o_DiffBind_macs_diffbind_results | n/a | n/a |
|
o_DiffBind_macs_correlation_heatmap | n/a | n/a |
|
o_DiffBind_macs_diffbind_consensus | n/a | n/a |
|
o_DiffBind_macs_diffbind_normalized_counts | n/a | n/a |
|
o_DiffBind_macs_diffbind_dba_object | n/a | n/a |
|
o_exclude_black_list_regions_narrowPeak | n/a | n/a |
|
o_bed_to_rose_gff_conversion | n/a | n/a |
|
o_rose_main_gff_dir_outputs | n/a | n/a |
|
o_rose_main_mappedGFF_dir_outputs | n/a | n/a |
|
o_rose_main_STITCHED_ENHANCER_REGION_MAP | n/a | n/a |
|
o_rose_main_AllEnhancers_table | n/a | n/a |
|
o_rose_main_SuperEnhancers_table | n/a | n/a |
|
o_rose_main_Plot_points | n/a | n/a |
|
o_rose_main_Enhancers_withSuper | n/a | n/a |
|
o_enhancer_bed_processing | n/a | n/a |
|
o_ChIPQC_rose_ChIPQCexperiment | n/a | n/a |
|
o_ChIPQC_rose_outdir | n/a | n/a |
|
o_ChIPQC_rose_ChIPQCreport | n/a | n/a |
|
o_DiffBind_rose_diffbind_results | n/a | n/a |
|
o_DiffBind_rose_correlation_heatmap | n/a | n/a |
|
o_DiffBind_rose_diffbind_consensus | n/a | n/a |
|
o_DiffBind_rose_diffbind_normalized_counts | n/a | n/a |
|
o_DiffBind_rose_diffbind_dba_object | n/a | n/a |
|
Version History
Version 1 (earliest) Created 5th Jul 2023 at 10:39 by Konstantinos Kyritsis
Initial commit
Frozen
Version-1
c05e175
Creators and Submitter
Creators
Submitter
Citation
Kyritsis, K., Pechlivanis, N., & Psomopoulos, F. (2023). CWL-based ChIP-Seq workflow. WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.525.1
License
Activity
Views: 1924 Downloads: 282
Created: 5th Jul 2023 at 10:39
Annotated Properties
Topic annotations
Tags
Attributions
None