The HealthOmics workflow definition files must meet the following requirements:
-
Declare all parameters in the workflow definition file. Parameters include input and output locations, Amazon ECR container repositories, and runtime parameters such as allocated memory or CPU.
-
Your workflow tasks can't access resources using the public internet. Make sure that the workflow can access all input data from AWS resources such as S3.
-
Declare the output files in the workflow definition file. If you want to copy intermediate run files to the output location, declare them as workflow outputs.
-
The input and output locations must be in the same Region as the workflow run.
-
HealthOmics storage workflow inputs must be in
ACTIVE
status. HealthOmics won't import inputs with anARCHIVED
status, causing the workflow to fail. For information about Amazon S3 object inputs, see HealthOmics run inputs. -
HealthOmics provides the following methods to specify the main entrypoint for the workflow:
-
If the workflow definition consists of one file, that file is the main entrypoint for the workflow
-
If the workflow definition consists of multiple files, you can name the entrypoint file
main.ext
, where ext is either wdl, nf, or cwl for WDL, Nextflow, or CWL, respectively. -
When you create the workflow, you can specify a main entrypoint that isn't named main.
-
-
Before you create a workflow, create a zip archive of the workflow definition files and any dependencies, such as subworkflows.
-
We recommend that you declare Amazon ECR containers in the workflow as input parameters for validation of the Amazon ECR permissions.
Additional Nextflow considerations:
-
/bin
Nextflow workflow definitions may include a /bin folder with executable scripts. This path has read-only plus executable access to tasks. Tasks that rely on these scripts should use a container built with the appropriate script interpreters. Best practice is to call the interpreter directly. For example:
process my_bin_task { ... script: """ python3 my_python_script.py """ }
-
includeConfig
Nextflow-based workflow definitions can include nextflow.config files that help to abstract parameter definitions or process resource profiles. To support development and execution of Nextflow pipelines on multiple environments, use a HealthOmics-specific configuration that you add to the global config using the includeConfig directive. To maintain portability, configure the workflow to include the file only when running on HealthOmics by using the following code:
// at the end of the nextflow.config file if ("$AWS_WORKFLOW_RUN") { includeConfig 'conf/omics.config' }
-
Reports
HealthOmics doesn't support engine-generated dag, trace, and execution reports. You can generate alternatives to the trace and execution reports using a combination of GetRun and GetRunTask API calls.
Additional CWL considerations:
-
Container image uri interpolation
HealthOmics allows the dockerPull property of the DockerRequirement to be an inline javascript expression. For example:
requirements: DockerRequirement: dockerPull: "$(inputs.container_image)"
This allows you to specifying container image URIs as input parameters to the workflow.
-
Javascript expressions
Javascript expressions must be
strict mode
compliant. -
Operation process
HealthOmics doesn't support CWL Operation processes.