Troubleshooting custom Slurm settings in AWS PCS
If you encounter errors when creating or updating AWS PCS resources with Slurm custom settings, you can use logging to diagnose and resolve the issues.
Troubleshooting incompatible Slurm custom settings
Problem: You receive an error message similar to the following when performing cluster, compute node group, or queue operations:
{OPERATION} failed. The Slurm custom settings of the cluster might be incompatible. Check the settings and try again.
This error can occur with the following operations:
-
CreateCluster
-
CreateComputeNodeGroup
-
UpdateComputeNodeGroup
-
CreateQueue
-
UpdateQueue
Solution: Enable logging to understand the specific issue and troubleshoot the incompatible settings.
To troubleshoot incompatible Slurm custom settings
-
Create the cluster if it doesn't exist yet, or ensure your existing cluster is in a state where logging can be enabled.
-
Enable logging for your cluster. For detailed instructions, see Logging and monitoring for AWS PCS.
Note
Logging can be enabled once the cluster is in creation.
-
Review the logs to identify the specific Slurm configuration issue causing the incompatibility.
-
Correct the incompatible custom settings based on the log information and retry the operation.
For information about supported Slurm custom settings, see: