CWL + RO-Crate Workflow Descriptions
This repository stores computational workflows described using the Common Workflow Language (CWL) and enriched with metadata using Research Object Crate (RO-Crate) conforming to the Workflow Run RO-Crate profile.
Each workflow is contained in its own directory (e.g., WF5201
, WF6101
, ...). Inside each workflow directory you will typically find at least:
- The CWL workflow definition (with the same name as the directory, e.g.,
WF5201.cwl
). - The RO-Crate metadata file (
ro-crate-metadata.json
).
Additional files supporting the workflow may also be included.
Overview
This document explains how to represent workflows by combining:
- CWL (Common Workflow Language): Used to define the computational steps, data flows, and tools.
- RO-Crate: Used to capture associated metadata (e.g., authorship, licenses, software, datasets) for the workflow.
By separating the abstract workflow definition from its metadata description, you can leverage existing tools for visualization, editing, and validation of your workflows while maintaining a clear structure.
Our Approach
We represent workflows using a combination of CWL and RO-Crate:
- CWL: Captures the abstract definition of the workflow, detailing its computational steps, data flows, and the tools utilized. It does not include the implementation details of each operation.
- RO-Crate: Provides rich metadata for the overall repository, the workflow file(s), software, and datasets. This metadata allows you to understand the context, provenance, and related details of the workflow components.
This separation provides flexibility by keeping the execution details (CWL) distinct from descriptive metadata (RO-Crate), yet they remain tightly connected.
Describing a Workflow using CWL + RO-Crate
To fully describe a workflow, you must separate the workflow definition (using CWL) from the metadata description (using RO-Crate).
Defining the CWL Workflow
-
Identify Global Inputs and Outputs:
Decide on the data that enters the workflow (inputs) and the final results (outputs). Optionally, include intermediate outputs if they are of interest. -
Create the CWL File:
Write a CWL file in YAML format. Start with file metadata such as:cwlVersion: v1.2 class: Workflow requirements: MultipleInputFeatureRequirement: {} SubworkflowFeatureRequirement: {}
[NOTE] The
requirements
section may vary depending on your workflow. For example, if you use sub-workflows, you must include theSubworkflowFeatureRequirement
. -
Declare Global Inputs and Outputs:
inputs: DT5210: Directory DT5211: Directory outputs: DT5208: type: Directory outputSource: SS5213/DT5208
[NOTE] Although
Directory
is commonly used to represent a dataset, you might choose a different type. Refer to the CWL documentation for additional types.
Defining Workflow Steps
Each workflow step (or subworkflow) follows a consistent structure:
SS5205:
in:
DT5210: DT5210
run:
class: Operation
inputs:
DT5210: Directory
outputs:
DT5201: File
DT5203: Directory
out:
- DT5201
- DT5203
Key elements are:
in
: Defines which data this step requires.run
:- For operations: Uses the
Operation
class to abstract away the underlying execution details. - For subworkflows: Points to another CWL file.
- For operations: Uses the
out
: Lists the output data produced by the step.
Connecting Steps via Data Dependencies
CWL does not require an explicit execution order. Instead, dependencies are determined by connecting outputs to inputs:
ST520102:
in:
DT5201: ST520101/DT5201
run: ST520102.cwl
out:
- DT5255
This connection means ST520102
depends on the output (DT5201
) of ST520101
and will execute after it, while still allowing independent steps to run in parallel.
Validating Your Workflow and Metadata
-
CWL Validation:
Use cwltool to check your CWL files for syntax errors and to generate a graphical visualization (using Graphvizdot
format) for verifying the workflow structure. -
RO-Crate Validation:
Validate yourro-crate-metadata.json
file with tools such as the RO-Crate Validator (Python) and explore your RO-Crate interactively with ro-crate-html-js.
Additional Resources
Click and drag the diagram to pan, double click or use the controls to zoom.
Inputs
ID | Name | Description | Type |
---|---|---|---|
DT6102 | n/a | NEAMTHM18 |
|
DT6103 | n/a | EIDA seismic data archive |
|
DT6104 | n/a | SLSMF sea level data |
|
DT6105 | n/a | GNSS displacements |
|
DT6109 | n/a | topo-bathymetric grids |
|
Steps
ID | Name | Description |
---|---|---|
ST610101 | n/a | SS6101 |
ST610102 | n/a | SS6102 |
ST610103 | n/a | SS6103 |
ST610104 | n/a | SS6104 |
ST610105 | n/a | SS6105 |
ST610106 | n/a | n/a |
ST610107 | n/a | SS6113 |
ST610108 | n/a | SS6114 |
ST610109 | n/a | n/a |
ST610110 | n/a | SS6117 |
ST610111 | n/a | SS6118 |
Outputs
ID | Name | Description | Type |
---|---|---|---|
DT6101 | n/a | Scenario Library |
|
DT6106 | n/a | list of earthquake scenarios |
|
DT6107 | n/a | list of scenario probabilities |
|
DT6108 | n/a | list of landslide scenarios |
|
DT6110 | n/a | Tsunami intensities |
|
DT6111 | n/a | Ground deformation |
|
DT6112 | n/a | Tsunami hazard curves |
|
DT6113 | n/a | Hazard visual products |
|
Version History
main @ c324ab2 (latest) Created 21st Feb 2025 at 13:26 by Raül Sirvent
update preview
Frozen
main
c324ab2
main @ 3923678 (earliest) Created 20th Dec 2024 at 14:54 by Raül Sirvent
add ro-crate-preview
Frozen
main
3923678

Creators
Not specifiedAdditional credit
Marco Salvi
Submitter
Views: 55 Downloads: 0
Created: 20th Dec 2024 at 14:54
Last updated: 21st Feb 2025 at 13:26

This item has not yet been tagged.

None