The main purpose of a run cache is to optimize computation of tasks in the run. If there is a valid matching cache entry for a task, HealthOmics uses the cache entry instead of recomputing the task. Otherwise, HealthOmics reverts to the default service behavior, which is to recompute the task and its dependent tasks. By using this approach, cache misses don't cause the run to fail.
We recommend that you manage the run cache size. Over time, cache entries may no longer be valid because of HealthOmics service updates or because of changes you made in the run or the run tasks. The following sections provide additional details.
Manage manifest version updates
Periodically, the HealthOmics service may introduce new features or updates that invalidate some or all run cache entries. In this situation, your runs can experience a one-time cache miss.
HealthOmics creates a JSON manifest file for each cache entry. For runs started after February 12th 2025, the manifest file includes a version parameter. If a service update invalidates any cache entries, HealthOmics increments the version number so that you can identify the legacy cache entries for removal.
The following example shows a manifest file with the version set to 2:
{
"arn": "arn:aws:omics:us-west-2:12345678901:runCache/0123456/cacheEntry/1234567-195f-3921-a1fa-ffffcef0a6a4",
"s3uri": "s3://example/1234567-d0d1-e230-d599-10f1539f4a32/1348677/4795326/7e8c69b1-145f-3991-a1fa-ffffcef0a6a4",
"taskArn": "arn:aws:omics:us-west-2:12345678901:task/4567891",
"workDir": "/mnt/workflow/1234567-d0d1-e230-d599-10f1539f4a32/workdir/call-TxtFileCopyTask/5w6tn5feyga7noasjuecdeoqpkltrfo3/wxz2fuddlo6hc4uh5s2lreaayczduxdm",
"files": [
{
"name": "output_txt_file",
"path": "out/output_txt_file/outfile.txt",
"etag": "ajdhyg9736b9654673b9fbb486753bc8"
}
],
"nextflowContext": {},
"otherOutputs": {},
"version": 2,
}
For runs with cache entries that are no longer valid, rebuild the cache to create new valid entries. Perform the following steps for each run:
-
Start the run once with cache retention set to CACHE ALWAYS. This run creates the new cache entries.
-
For subsequent runs, set the cache retention to its former setting (CACHE ALWAYS or CACHE ON FAILURE).
To clean-up cache entries that are no longer valid, you can delete these cache entries from the cache Amazon S3 bucket. HealthOmics never reuses these cache entries. If you choose to retain entries that aren't valid, there is no impact on your runs.
Control run cache size
HealthOmics doesn't delete or auto-archive any run cache data or apply Amazon S3 clean-up rules for managing the cache data. We recommend that you perform regular cache clean-ups to save on Amazon S3 storage costs and to keep your run cache size manageable. You can delete files directly or set data retention/replication policies on the run cache bucket.
For example, you can configure an Amazon S3 lifecycle policy to expire objects after 90 days, or you can manually clean-up the cache data at the end of each development project.
The following information can help you manage cache data size:
-
You can view how much data is in the cache by checking Amazon S3. HealthOmics doesn't monitor or report on cache size.
-
If you delete a valid cache entry, the subsequent run doesn't fail. HealthOmics recomputes the task and its dependent tasks.
-
If you modify cache names or directory structures such that HealthOmics can’t find a matching entry for a task, HealthOmics recomputes the task.