Run storage types - AWS HealthOmics

Run storage types

For a given workflow or workflow run, you can choose static or dynamic run storage. By default, HealthOmics provides static run storage. Consider the following factors when deciding which run storage type to use:

  • Static

    • HealthOmics allocates a fixed amount of run storage.

    • You can specify the storage size in the StartRun API request. The system rounds up the value to the nearest multiple of 1200 GiB. If that storage size isn't available, it rounds up to the nearest multiple of 2400 GiB.

    • The default run storage is 1200 GiB, if you don't specify a value.

    • If the specified storage size is too low, the run fails with an Out of storage for file system error.

    • Static run storage is suitable for large workflows. It provides higher file system throughput per GiB and lower cost per GiB than dynamic run storage.

    • Use static run storage for burst workloads that scale out wide and quickly (for example, a large volume of RNASeq samples processed in parallel).

  • Dynamic

    • You don’t need to estimate the required storage for the run. HealthOmics allocates a starting amount of run storage. The storage size dynamically scales up and down, based on file system utilization during the run. A run never fails due to an Out of storage for file system error.

    • Dynamic run storage provides faster provisioning/deprovisioning time than static run storage. Faster setup is an advantage for smaller workflows that run frequently and is also an advantage during development/test cycles.

    • Dynamic run storage uses burst credits to control burst throughput, so don't use it for workflows that require a peak burst throughput of 50MiBs or higher.

    • When burst credits expire, dynamic run storage capacity increases can slow down. The system creates a warning in the logs when a burst credit expires. If your workflow frequently runs out of burst credits, consider using static run storage.

    • After the run completes (success path or fail path), the getRun API operation returns the maximum storage used by the run in the storageCapacity field. You can also find this information in the run manifest logs located in the omics log group.

      • For a dynamic storage run that completes within 2 hours, the maximum storage value may not be available.

Note

Run storage usage incurs charges to your account. For pricing information about static and dynamic run storage, see HealthOmics pricing.

Calculating required static run storage

A workflow requires additional capacity when it uses static run storage (compared with dynamic run storage) because the base file system installation uses 7% of the static file system capacity.

If you run a dynamic run storage workflow to measure the maximum storage used by the run, use the following calculation to determine the minimum amount of static storage required:

static storage required = maximum storage in GiB used by the dynamic run storage + (total static file system size in GiB * 0.07)

For example:

Maximum storage measured from a dynamic run storage workflow run: 500GiB File system size: 1200GiB 7% of the file system size: 84GiB 500 + 84 = 584GiB of static run storage required for this run.

Therefore, 1200GiB (the minimum capacity for static run storage) is sufficient for this run.