ATAC-Seq data processing workflow
master @ f6ad72e

Workflow Type: Common Workflow Language
Stable

This workflow supports processing of bulk ATAC-Seq data from raw reads to genome-wide accessiblity tracks (bigWig) and ATAC peaks. The main steps include read trimming using trimGalore, alignment with bowtie2, coverage generation using samtools and peak calling with MACS2.

Inputs

ID Name Description Type
sample_id n/a Sample ID used for naming the output files.
  • string
fastq1 n/a List of fastq files containing the first mate of raw reads. Muliple files are provided if multiplexing of the same library has been done on multiple lanes. The reads comming from different fastq files are pooled after alignment. Also see parameter "fastq2".
  • array containing
    • File
fastq2 n/a List of fastq files containing the second mate of raw reads. Important: this list has to be of same length as parameter "fastq1".
  • array containing
    • File
adapter1 n/a Adapter sequence for first reads. If not specified (set to "null"), trim_galore will try to autodetect whether ...\n - Illumina universal adapter (AGATCGGAAGAGC)\n - Nextera adapter (CTGTCTCTTATA)\n - Illumina Small RNA 3-prime Adapter (TGGAATTCTCGG)\n ... was used.\n You can directly choose one of the above configurations by setting the string to "illumina", "nextera", or "small_rna". Or you specify the adaptor string manually (e.g. "AGATCGGAAGAGC").
  • string?
adapter2 n/a Adapter sequence for second reads. If not specified (set to "null"), trim_galore will try to autodetect whether ...\n - Illumina universal adapter (AGATCGGAAGAGC)\n - Nextera adapter (CTGTCTCTTATA)\n - Illumina Small RNA 3-prime Adapter (TGGAATTCTCGG)\n ... was used.\n You can directly choose one of the above configurations by setting the string to "illumina", "nextera", or "small_rna". Or you specify the adaptor string manually (e.g. "AGATCGGAAGAGC").
  • string?
genome n/a Path to reference genome in fasta format. Bowtie2 index files (".1.bt2", ".2.bt2", ...) as well as a samtools index (".fai") has to be located in the same directory.\n All of these files can be downloaded for the most common genome builds at https://support.illumina.com/sequencing/sequencing_software/igenome.html. Alternatively, you can use "bowtie2-build" or "samtools index" to create them yourself.
  • File
genome_info n/a Path to a tab-delimited file listing chromosome sizes in following fashion:\n "chromosome_nametotal_number_of_bp".\n For the most common UCSC genome build, you can find corresponding files at: https://github.com/CompEpigen/ATACseq_workflows/tree/master/chrom_sizes. Or you can generate them yourself using UCSC script fetchChromSizes (http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/fetchChromSizes) in following fashion:\n "fetchChromSizes hg38 > hg38.chrom.sizes".\n If you are dealing with a non-UCSC build, you can generate such a file from a samtools index using:\n "awk -v OFS='\t' {'print $1,$2'} hg38.fa.fai > hg38.chrom.sizes".
  • File
max_mapping_insert_length n/a Maximum insert length between two reads of a pair. In case of ATACseq, very long insert sizes are possible. So it is recommended to use at least a value of 1500. However, please note that alignment will take significantly longer for higher insert sizes. The default is 2500.
  • long
macs2_qvalue n/a Q-value cutoff used for peak calling by MACS2. The default is 0.05.
  • float
effective_genome_size n/a The effectively mappable genome size, please see: https://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html
  • long
bin_size n/a Bin size used for generation of coverage tracks. The larger the bin size the smaller are the coverage tracks, however, the less precise is the signal. For single bp resolution set to 1.
  • int
ignoreForNormalization n/a List of space-delimited chromosome names that shall be ignored when calculating the scaling factor. Specify as space-delimited string. Default: "chrX chrY chrM"
  • string?

Steps

ID Name Description
trim_and_map n/a n/a
merge_duprem_filter n/a n/a
name_sorting_filtered_bam n/a samtools sort - sorting of filtered bam file by read name
converting_bam_to_bedpe n/a bedtools bamtobed
generating_atac_signal_tags n/a n/a
generating_coverage_tracks n/a n/a
peak_calling_macs2_broad n/a peak calling using macs2
peak_calling_macs2_narrow n/a peak calling using macs2
plot_fragment_size_distribution n/a n/a
qc_plot_fingerprint n/a n/a
qc_phantompeakqualtools n/a n/a
create_summary_qc_report n/a multiqc summarizes the qc results from fastqc and other tools

Outputs

ID Name Description Type
raw_fastqc_zip n/a n/a
  • array containing
    • array containing
      • File
raw_fastqc_html n/a n/a
  • array containing
    • array containing
      • File
trim_galore_log n/a n/a
  • array containing
    • array containing
      • File
trimmed_fastqc_html n/a n/a
  • array containing
    • array containing
      • File
trimmed_fastqc_zip n/a n/a
  • array containing
    • array containing
      • File
bowtie2_log n/a n/a
  • array containing
    • File
duprem_fastqc_zip n/a n/a
  • array containing
    • File
duprem_fastqc_html n/a n/a
  • array containing
    • File
merged_flagstat_output n/a n/a
  • File
filtered_flagstat_output n/a n/a
  • File
duprem_flagstat_output n/a n/a
  • File
bam n/a n/a
  • File
picard_markdup_log n/a n/a
  • File
frag_size_stats_tsv n/a n/a
  • File
filtering_stats_tsv n/a n/a
  • File
fragment_sizes_tsv n/a n/a
  • File
irreg_mappings_bedpe n/a n/a
  • File
bam_signal_tags n/a n/a
  • array containing
    • File
bigwig_signal_tags n/a n/a
  • array containing
    • File
peaks_bed_macs2_broad n/a n/a
  • array containing
    • array containing
      • File
peaks_xls_macs2_broad n/a n/a
  • array containing
    • File
peaks_bed_macs2_narrow n/a n/a
  • array containing
    • File
peaks_xls_macs2_narrow n/a n/a
  • File
frag_size_distr_plot n/a n/a
  • File
frag_size_distr_tsv n/a n/a
  • File
qc_plot_fingerprint_plot n/a n/a
  • File?
qc_plot_fingerprint_tsv n/a n/a
  • File?
qc_plot_fingerprint_stderr n/a n/a
  • File
qc_crosscorr_summary n/a n/a
  • File?
qc_crosscorr_plot n/a n/a
  • File?
qc_phantompeakqualtools_stderr n/a n/a
  • File?
multiqc_zip n/a n/a
  • File
multiqc_html n/a n/a
  • File

Version History

master @ f6ad72e (earliest) Created 27th Jun 2025 at 14:53 by Pavlo Lutsik

added PDF of cwl-viewer


Frozen master f6ad72e
help Creators and Submitter
Creator
  • Kersten Breuer
Submitter
Activity

Views: 9   Downloads: 1

Created: 27th Jun 2025 at 14:52

Annotated Properties
Topic annotations
Operation annotations
help Tags

This item has not yet been tagged.

help Attributions

None

Total size: 9.21 MB
Powered by
(v.1.17.0-main)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH