/AWS1/CL_SGMCLUSTTIEREDSTRGCFG¶
Defines the configuration for managed tier checkpointing in a HyperPod cluster. Managed tier checkpointing uses multiple storage tiers, including cluster CPU memory, to provide faster checkpoint operations and improved fault tolerance for large-scale model training. The system automatically saves checkpoints at high frequency to memory and periodically persists them to durable storage, like Amazon S3.
CONSTRUCTOR
¶
IMPORTING¶
Required arguments:¶
iv_mode
TYPE /AWS1/SGMCLUSTERCONFIGMODE
/AWS1/SGMCLUSTERCONFIGMODE
¶
Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to
Enable
, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set toDisable
, the feature is turned off and the memory management daemon is removed from the cluster.
Optional arguments:¶
iv_instmemoryallocpercentage
TYPE /AWS1/SGMCLSTINSTMEMALLOCPER00
/AWS1/SGMCLSTINSTMEMALLOCPER00
¶
The percentage (int) of cluster memory to allocate for checkpointing.
Queryable Attributes¶
Mode¶
Specifies whether managed tier checkpointing is enabled or disabled for the HyperPod cluster. When set to
Enable
, the system installs a memory management daemon that provides disaggregated memory as a service for checkpoint storage. When set toDisable
, the feature is turned off and the memory management daemon is removed from the cluster.
Accessible with the following methods¶
Method | Description |
---|---|
GET_MODE() |
Getter for MODE, with configurable default |
ASK_MODE() |
Getter for MODE w/ exceptions if field has no value |
HAS_MODE() |
Determine if MODE has a value |
InstanceMemoryAllocationPercentage¶
The percentage (int) of cluster memory to allocate for checkpointing.
Accessible with the following methods¶
Method | Description |
---|---|
GET_INSTMEMORYALLOCPERCAGE() |
Getter for INSTMEMORYALLOCPERCENTAGE, with configurable defa |
ASK_INSTMEMORYALLOCPERCAGE() |
Getter for INSTMEMORYALLOCPERCENTAGE w/ exceptions if field |
HAS_INSTMEMORYALLOCPERCAGE() |
Determine if INSTMEMORYALLOCPERCENTAGE has a value |