To create a private workflow, you require the following inputs:
A workflow definition file written in WDL, Nextflow, or CWL. For more information, see Workflow definition files in HealthOmics.
A parameter template file written in JSON. For more information, see Parameter template files for HealthOmics workflows.
If your workflow definition file is larger than 4 MiB (zipped), upload it to an Amazon S3 folder, and specify the Amazon S3 location when you create the workflow.
Topics
Creating a workflow using the console
To create a workflow
-
Open the HealthOmics console https://console.aws.amazon.com/omics/
. -
In the left navigation pane, choose Private workflows.
-
On the Private workflows page, choose Create workflow.
-
On the Create workflow page, provide the following information
-
Workflow name - A distinctive name for this workflow.
-
Description (optional) - A description of this workflow.
-
Default run storage capacity (optional) - The default amount of run storage required for this workflow. You can override this default when you start a workflow run. The default value for this parameter is 1.2 TB.
-
Under Workflow definition, if you choose Select definion folder from S3, enter the Amazon S3 location that contains the zipped workflow definition folder.
-
If you choose Select definion folder from a local source, enter the location of the zipped workflow definition folder.
-
For Workflow language, select the specification language of the workflow.
-
Tags (optional) - Provide up to 50 tags for this workflow.
-
-
Choose Next.
-
On the Add workflow parameters page, provide the workflow parameters. You can either upload a JSON file that specifies the parameters or manually enter your workflow parameters.
-
Choose Next.
-
Review the workflow configuration, then choose Create workflow.
Creating a workflow using the CLI
After you define your workflow and the parameters, you can create a workflow using the CLI as shown.
aws omics create-workflow --name my_workflow --definition-zip fileb://my-definition.zip \ --parameter-template file://my-parameter-template.json
If your workflow definition file located in an Amazon S3 folder, enter the location using the
--definition-uri
parameter instead of --definition-zip
. For more information, see
CreateWorkflow in the AWS HealthOmics API Reference.
You receive the following response to the create-workflow
request.
{
"arn": "arn:aws:omics:us-west-2:....",
"id": "1234567",
"status": "CREATING",
"tags": {
"resourceArn": "arn:aws:omics:us-west-2:...."
}
}
Creating a workflow using an SDK
You can create a workflow using one of the SDKs.
The following example shows how to create a workflow using the Python SDK
import boto3 omics = boto3.client('omics') with open('definition.zip', 'rb') as f: definition = f.read() response = omics.create_workflow( name='my_workflow', definitionZip=definition, parameterTemplate={ ... } )
Optional parameters to use when creating a workflow
You can specify one or more optional parameters when you create a workflow. For more information, see CreateWorkflow in the AWS HealthOmics API Reference.
If you are including multiple workflow definition files, use the --main
parameter to specify
which file is the main definition file for your workflow.
If you uploaded your workflow definition file to an Amazon S3 folder, specify the location using the
--definition-uri
parameter, as shown in the following example.
aws omics create-workflow --name Test --main multi_workflow/workflow2.wdl --definition-zip fileb://definition.zip --parameter-template file://params_sample_description.json
Use the accelerators parameter to create a workflow that runs on an accelerated-compute instance.
The following example shows how to use the --accelerators
parameter.
aws omics create-workflow --name
\ --definition-uri s3://amzn-s3-demo-bucket1/GPUWorkflow.zip \ --accelerators GPU
workflow name
Referencing genome files from a workflow definition
An HealthOmics reference store object can be referred to with a URI like the
following. Use your own
, account ID
, and reference store ID
where indicated.reference ID
omics://
.storage.us-west-2.amazonaws.com/
account ID
/reference/
reference store id
id
Some workflows will require both the SOURCE
and INDEX
files for the reference genome. The previous URI is the default short form and will
default to the SOURCE file. In order to specify either file, you can use the long
URI form, as follows.
omics://
.storage.us-west-2.amazonaws.com/
account ID
/reference/
reference store id
/source omics://
id
.storage.us-west-2.amazonaws.com/
account ID
/reference/
reference store id
/index
id
Using a sequence read set would have a similar pattern, as shown.
aws omics create-workflow \ --name
\ --main
workflow name
\ --definition-uri omics://
sample workflow.wdl
.storage.us-west-2.amazonaws.com/
account ID
/readSet/
sequence_store_id
\ --parameter-template
id
file://parameters_sample_description.json
Some read sets, such as those based on FASTQ, can contain paired reads. In the
following examples, they’re referred to as SOURCE1 and SOURCE2. Formats such as BAM
and CRAM will only have a SOURCE1 file. Some read sets will contain INDEX files such
as bai
or crai
files. The preceding URI is the default
short form and will default to the SOURCE1 file. To specify the exact file or index,
you can use the long URI form, as follows.
omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>/source1 omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>/source2 omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>/index
The following is an example of an input JSON file that uses two Omics Storage URIs.
{
"input_fasta": "omics://123456789012.storage.us-west-2.amazonaws.com/<reference_store_id>/reference/<id>",
"input_cram": "omics://123456789012.storage.us-west-2.amazonaws.com/<sequence_store_id>/readSet/<id>"
}
Reference the input JSON file in the AWS CLI by adding --inputs
file://<input_file.json>
to your start-run
request.
Verifying the status of a workflow
After you create your workflow, you can verify the status and view other details of the workflow using get-workflow, as shown.
aws omics get-workflow --id 1234567
The response gives you your workflow details, including the status, as shown.
{
"arn": "arn:aws:omics:us-west-2:....",
"id": "1234567",
"status": "ACTIVE",
"type": "PRIVATE",
"name": "workflow_name"
"creationTime": "2022-07-06T00:27:05.542459"
}
Before a run can be started, the status must be listed as
ACTIVE
.