Starting a workflow run
When you start a run, you can set the run storage type and storage amount (for static storage). For additional information, see Run storage types.
You also set the run priority. How priority impacts the run depends on whether the run is associated with a run group. For additional information, see Run priority.
Specifying run retention mode
When runs complete, HealthOmics archives the run metadata to CloudWatch. By default, CloudWatch keeps the run data indefinitely, unless you change the CloudWatch retention policy. Run outputs are also stored in Amazon S3 until you delete them.
HealthOmics retains the metatdata for up to 5000 runs for use by the console and API operations (ListRuns and GetRun). When you start a run, you can set the run retention mode parameter to indicate the retention behavior for the run. The parameter supports the values REMOVE and RETAIN.
You can set the retention mode of the run to REMOVE. When HealthOmics tries to add a run but it has already saved 5000 runs, it automatically removes the metadata for the oldest run with REMOVE mode. This removal doesn't affect the data stored in CloudWatch or Amazon S3.
RETAIN is the default value for run retention mode. For runs in this mode, the system doesn't delete the run metadata. If HealthOmics reaches the maximum number of runs, all set to RETAIN, you won't be able to create additional runs until you delete some runs.
If you're planning to run a batch of more than 5,000 runs at the same time, make sure to set the run retention mode to REMOVE. Otherwise, the batch fails when HealthOmics tries to start the 5001st run.
Additional considerations for using REMOVE retention mode:
-
When you first start using REMOVE as the retention mode, consider deleting one or more runs that use RETAIN mode, to free up slots. As you start additional REMOVE runs, the automatic removal takes over, so enough slots are available for new runs.
-
We recommend that you configure a unique name for each run. After HealthOmics removes a run, you cannot use the console or API to find the run name or ID. However, you can use CloudWatch to search for the run name, so use unique names to get the best search results.
-
If you want to re-run an archived run (or a set of runs), use the HealthOmics rerun CLI tool. For more information and examples of how to use this tool, see Omics rerun
in the HealthOmics tools GitHub repository.
Specifying Amazon S3 input parameters
For an input parameter that accepts an Amazon S3 location, the parameter can specify the location of one file or a whole directory of files. Using a directory has the following advantages:
-
Convenience – You specify the directory name as the parameter. You don't list each file name.
-
Compactness – The input parameter maximum file size is 50 KB. If you provide a long list of input file names, you can exceed this maximum.
Amazon S3 is a flat object-storage system, so it doesn't support directories. You group files into a "directory" by giving each file the same object key prefix. For more information about Amazon S3 object key prefixes, see Organizing objects using prefixes.
HealthOmics interprets the input parameter value as follows:
-
If the Amazon S3 location doesn't end with a forward slash or use the glob pattern, HealthOmics expects the parameter value to be the key for one Amazon S3 object.
For example, you specify
s3://myfiles/runs/inputs/a/file1.fastq
to input file1.fastq -
If the Amazon S3 location ends with a forward slash, HealthOmics interprets the parameter value as an Amazon S3 prefix. It loads all the Amazon S3 objects with that prefix.
For example, you can specify
s3://myfiles/runs/inputs/a/
to load all objects whose keys start with this prefix. -
For Nextflow, HealthOmics supports the glob pattern for Amazon S3 URIs in input parameters.
For example, you can specify
“s3://myfiles/runs/inputs/a/*.gz”
to input all .gz files whose keys start with this prefix.
Starting a run (console)
To start a workflow run
-
Open the HealthOmics console https://console.aws.amazon.com/omics/
. In the left navigation pane, choose Runs.
-
On the Runs page, choose Create run.
-
On the Create run page, provide the following information
-
Workflow ID - The workflow ID associated with this run.
-
Run name - A distinctive name for this run.
-
Run priority - The priority of this run. Higher numbers specify a higher priority, and the highest priority tasks are run first.
-
Run storage capacity - The amount of temporary storage needed for the run. By default, the run storage capacity that was set for the workflow will be selected. You can select a different run storage capacity for your run.
-
Select S3 output destination - The S3 location where the run outputs will be saved.
-
-
Under Service role, you can use an existing service role or create a new one.
-
(Optional) For Tags, you can assign up to 50 tags to the run.
-
Choose Next.
-
On the Add parameter values page, provide the workflow parameters. You can either upload a JSON file that specifies the parameters or manually enter your workflow parameters.
-
Choose Next.
-
On the Add run groups page, provide the run group details.
-
On the Run cache page, provide the run cache details.
-
Choose Review and start run.
-
On the Review and start run page, choose Start run.
Starting a run (API)
Use the start-run API operation with the IAM role and
Amazon S3 bucket that you created. Although the default retention mode is
RETAIN
, this example sets the retention mode to
REMOVE
. If the quota for maximum runs has been met, the earliest runs
with REMOVE
retention mode are deleted first. This makes room for new
runs to start–even if the maximum runs limit is met–as long as there
are runs with REMOVE
retention mode that can be removed.
When the parameter is set to REMOVE
, the run metadata is removed
after the run completes and the metadata has been sent to Amazon CloudWatch.
aws omics start-run --workflow-id
\ --role-arn arn:aws:iam::1234567892012:role/service-role/OmicsWorkflow-20221004T164236 \ --name
workflow id
\ --retention-mode REMOVE
workflow name
In response, you get the following output. The uuid
is unique to the
run, and along with runOutputUri
can be used to track where output data
is written.
{ "arn": "arn:aws:omics:us-west-2:....:run/1234567", "id": "1234567", "uuid":"96c57683-74bf-9d6d-ae7e-f09b097db14a", "runOutputUri":"s3://bucket/folder/8405154/96c57683-74bf-9d6d-ae7e-f09b097db14a" "status": "PENDING" }
If the parameter template for a workflow declares any required parameters, you can provide a local JSON file of the inputs when you start a workflow run. The JSON file contains the exact name of each input parameter and a value for the parameter.
Reference the input JSON file in the AWS CLI by adding --inputs file://<input_file.json>
to your
start-run
request.
You can also use the start-run API with a GPU workflow ID, as shown.
aws omics start-run --workflow-id
\ --role-arn arn:aws:iam::1234567892012:role/service-role/OmicsWorkflow-20221004T164236 \ --name GPUTestRunModel \ --output-uri s3://amzn-s3-demo-bucket1
workflow id
Get information about a workflow run
You can use the ID in the response with the get-run API to check the status of a run, as shown.
aws omics get-run --id
run id
The response from this API operation tells you the status of the workflow run.
Possible statuses are PENDING
, STARTING
,
RUNNING
, and COMPLETED
. When a run is
COMPLETED
, you can find an output file called
outfile.txt
in your output Amazon S3 bucket, in a folder named
after the run ID.
The get-run API operation also returns other details, such as
whether the workflow is Ready2Run
or PRIVATE
, the workflow
engine, and accelerator details. The following example shows the response for
get-run for a run of a private workflow, described in WDL
with a GPU accelerator and no tags assigned to the run.
{ "arn": "arn:aws:omics:us-west-2:123456789012:run/7830534", "id": "7830534", "uuid":"96c57683-74bf-9d6d-ae7e-f09b097db14a", "runOutputUri":"s3://bucket/folder/8405154/96c57683-74bf-9d6d-ae7e-f09b097db14a" "status": "COMPLETED", "workflowId": "4074992", "workflowType": "PRIVATE", "roleArn": "arn:aws:iam::123456789012:role/service-role/OmicsWorkflow-20221004T164236", "name": "RunGroupMaxGpuTest", "runGroupId": "9938959", "digest": "sha256:a23a6fc54040d36784206234c02147302ab8658bed89860a86976048f6cad5ac", "accelerators": "GPU", "outputUri": "s3://amzn-s3-demo-bucket1", "startedBy": "arn:aws:sts::123456789012:assumed-role/Admin/<role_name>", "creationTime": "2023-04-07T16:44:22.262471+00:00", "startTime": "2023-04-07T16:56:12.504000+00:00", "stopTime": "2023-04-07T17:22:29.908813+00:00", "tags": {} }
You can see the status of all runs with the list-runs API operation, as shown.
aws omics list-runs
To see all the tasks completed for a specific run, use the list-run-tasks API.
aws omics list-run-tasks --id
task ID
To get the details of any specific task, use the get-run-task API.
aws omics get-run-task --id <run_id> --task-id
task ID
After the run completes, the metadata is sent to CloudWatch under the stream
manifest/run/<run ID>/<run
UUID>
.
The following is an example of the manifest.
{ "arn": "arn:aws:omics:us-east-1:123456789012:run/1695324", "creationTime": "2022-08-24T19:53:55.284Z", "resourceDigests": { "s3://omics-data/broad-references/hg38/v0/Homo_sapiens_assembly38.dict": "etag:3884c62eb0e53fa92459ed9bff133ae6", "s3://omics-data/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta": "etag:e307d81c605fb91b7720a08f00276842-388", "s3://omics-data/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai": "etag:f76371b113734a56cde236bc0372de0a", "s3://omics-data/intervals/hg38-mjs-whole-chr.500M.intervals": "etag:27fdd1341246896721ec49a46a575334", "s3://omics-data/workflow-input-lists/dragen-gvcf-list.txt": "etag:e22f5aeed0b350a66696d8ffae453227" }, "digest": "sha256:a5baaff84dd54085eb03f78766b0a367e93439486bc3f67de42bb38b93304964", "engine": "WDL", "main": "gatk4-basic-joint-genotyping-v2.wdl", "name": "1044-gvcfs", "outputUri": "s3://omics-data/workflow-output", "parameters": { "callset_name": "cohort", "input_gvcf_uris": "s3://omics-data/workflow-input-lists/dragen-gvcf-list.txt", "interval_list": "s3://omics-data/intervals/hg38-mjs-whole-chr.500M.intervals", "ref_dict": "s3://omics-data/broad-references/hg38/v0/Homo_sapiens_assembly38.dict", "ref_fasta": "s3://omics-data/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta", "ref_fasta_index": "s3://omics-data/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai" }, "roleArn": "arn:aws:iam::123456789012:role/OmicsServiceRole", "startedBy": "arn:aws:sts::123456789012:assumed-role/admin/ahenroid-Isengard", "startTime": "2022-08-24T20:08:22.582Z", "status": "COMPLETED", "stopTime": "2022-08-24T20:08:22.582Z", "storageCapacity": 9600, "uuid": "a3b0ca7e-9597-4ecc-94a4-6ed45481aeab", "workflow": "arn:aws:omics:us-east-1:123456789012:workflow/1558364", "workflowType": "PRIVATE" }, { "arn": "arn:aws:omics:us-east-1:123456789012:task/1245938", "cpus": 16, "creationTime": "2022-08-24T20:06:32.971290", "image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/gatk", "imageDigest": "sha256:8051adab0ff725e7e9c2af5997680346f3c3799b2df3785dd51d4abdd3da747b", "memory": 32, "name": "geno-123", "run": "arn:aws:omics:us-east-1:123456789012:run/1695324", "startTime": "2022-08-24T20:08:22.278Z", "status": "SUCCESS", "stopTime": "2022-08-24T20:08:22.278Z", "uuid": "44c1a30a-4eee-426d-88ea-1af403858f76" }, ...
Run metadata isn't deleted if it's not present in the CloudWatch logs. You can also use
the run ID to rerun workflow runs using the CLI tool. Learn more and download the
tool from the HealthOmics
Tool Github repository
Re-running a workflow run
The following is an example of using the tool to rerun a workflow run, using the run ID. You can retrieve an ID for a run the CloudWatch logs.
omics-rerun 9876543 --name
--retention-mode REMOVE
workflow name
If the run exists in CloudWatch, you receive a response similar to the following.
Original request: { "workflowId": "9679729", "roleArn": "arn:aws:iam::123456789012:role/DemoRole", "name": "sample_rerun", "parameters": { "image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/default:latest", "file1": "omics://123456789012.storage.us-west-2.amazonaws.com/8647780323/readSet/6389608538" }, "outputUri": "s3://workflow-output-bcf2fcb1" } StartRun request: { "workflowId": "9679729", "roleArn": "arn:aws:iam::123456789012:role/DemoRole", "name": "new test", "parameters": { "image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/default:latest", "file1": "omics://123456789012.storage.us-west-2.amazonaws.com/8647780323/readSet/6389608538" }, "outputUri": "s3://workflow-output-bcf2fcb1" } StartRun response: { "arn": "arn:aws:omics:us-west-2:123456789012:run/9171779", "id": "9171779", "status": "PENDING", "tags": {} }
If the workflow no longer exists, you receive an error message.