Release notes for Slurm versions in AWS PCS - AWS PCS

Release notes for Slurm versions in AWS PCS

This topic describes important changes for each Slurm version currently supported in AWS PCS. We recommend you review the changes between the old and new versions when you upgrade your cluster.

Changes implemented in AWS PCS

For more information about Slurm 24.11, see the following publications:

Changes implemented in AWS PCS
  • The new Slurm Step Manager module is now enabled by default in AWS PCS. This module provides significant benefits by offloading step management from the central controller to compute nodes, substantially improving system concurrency in environments with heavy step usage. To support this configuration and better isolate Prolog and Epilog process execution, new prolog flags (Contain, Alloc) are enabled.

  • Hierarchical communication from controller to compute nodes is enabled to optimize Slurm intra-node communication, which improves scalability and performance. Additionally, the routing configuration now uses partition node lists for communications from the controller, instead of the plugin's default routing algorithm, enhancing system resiliency.

  • A new hash plugin HashPlugin=hash/sha3 replaces the previous hash/k12 plugin. This is now enabled by default in AWS PCS clusters.

  • Slurm controller logs now include enhanced auditing capabilities for all inbound remote procedure calls (RPC) to slurmctld. The logs include the source address, authenticated user, and RPC type before connection processing.

For more information about Slurm 24.05, see the following publications:

Slurm settings you can change in AWS PCS
  • The SuspendTime defaults to 60. Use the AWS PCS scaleDownIdleTimeInSeconds configuration parameter to set it. For more information, see the scaleDownIdleTimeInSeconds parameter of the ClusterSlurmConfiguration data type in the AWS PCS API Reference.

  • The MaxJobCount and MaxArraySize is based on the size you choose for the cluster. For more information, see the size parameter of the CreateCluster API action in the AWS PCS API Reference.

  • The SelectTypeParameters Slurm setting defaults to CR_CPU. You can provide it as a value for slurmCustomSettings to set it when you create a cluster. For more information, see the slurmCustomSettings parameter of the CreateCluster API action and SlurmCustomSetting in the AWS PCS API Reference.

  • You can set Prolog and Epilog at the cluster level. You can provide it as a value for slurmCustomSettings to set it when you create a cluster. For more information, see CreateCluster and SlurmCustomSetting in the AWS PCS API Reference.

  • You can set Weight and RealMemory at the compute node group level. You can provide it as a value for slurmCustomSettings to set it when you create a compute node group. For more information, see CreateComputeNodeGroup and SlurmCustomSetting in the AWS PCS API Reference.