I configured on demand capacity reservations (ODCRs) or zonal Reserved Instances - AWS ParallelCluster

ODCRs that include instances that have multiple network interfaces, such as P4d, P4de, and AWS Trainium (Trn)ODCRs are targeted ODCRS

I configured on demand capacity reservations (ODCRs) or zonal Reserved Instances

ODCRs that include instances that have multiple network interfaces, such as P4d, P4de, and AWS Trainium (Trn)

In the cluster configuration file, check that the HeadNode is in a public subnet and that the compute nodes are in a private subnet.

ODCRs are targeted ODCRS

Seeing `Unable to read file '/opt/slurm/etc/pcluster/run_instances_overrides.json'.` even though I already have `/opt/slurm/etc/pcluster/run_instances_overrides.json` in place by following the instructions given in Launch instances with On-Demand Capacity Reservations (ODCR)

If you are using AWS ParallelCluster versions 3.1.1 to 3.2.1 with targeted ODCRs, and you are also using the run instances override JSON file, it's possible that you don’t have the JSON file formatted correctly. You could see an error in clustermgtd.log, such as the following:


Unable to read file '/opt/slurm/etc/pcluster/run_instances_overrides.json'. 
Using default: {} in  /var/log/parallelcluster/clustermgtd.

Validate that the JSON file format is correct by running the following:


$ echo /opt/slurm/etc/pcluster/run_instances_overrides.json | jq

Seeing `Found RunInstances parameters override.` in `clustermgtd.log` when cluster creation failed, or in `slurm_resume.log` when run job failed

If you are using run instances override JSON file, check that you correctly set the queue name and the compute resources name in the /opt/slurm/etc/pcluster/run_instances_overrides.json file.

Seeing `An error occurred (InsufficientInstanceCapacity)` in `slurm_resume.log` when I fail to a run job, or in `clustermgtd.log` when I fail to create a cluster

Using PG-ODCR (Placement Group ODCR)

When creating an ODCR with an associated placement group, the same placement group name must be used in the configuration file. Set the corresponding placement group name in the cluster configuration.

Using zonal Reserved Instances

If you are using zonal Reserved Instances with PlacementGroup / Enabled to true in the cluster configuration, you might see an error, such as the following:


We currently do not have sufficient trn1.32xlarge capacity in the Availability Zone you requested (us-east-1d). Our system will be working on provisioning additional capacity. 
You can currently get trn1.32xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1b, us-east-1c, us-east-1e, us-east-1f.

You might see this because the zonal Reserved Instances aren't placed in the same UC (or spine), which can cause insufficient capacity errors (ICEs) when using placement groups. You can check this case by disabling the PlacementGroupGroup setting in the cluster configuration to determine if the cluster can allocate the instances.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Seeing Node bootstrap error in clustermgtd.log

Seeing An error occurred (VcpuLimitExceeded) in slurm_resume.log when I fail to run a job, or in clustermgtd.log, when I fail to create a cluster

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

I configured on demand capacity reservations (ODCRs) or zonal Reserved Instances

ODCRs that include instances that have multiple network interfaces, such as P4d, P4de, and AWS Trainium (Trn)

ODCRs are targeted ODCRS

Seeing `Unable to read file '/opt/slurm/etc/pcluster/run_instances_overrides.json'.` even though I already have `/opt/slurm/etc/pcluster/run_instances_overrides.json` in place by following the instructions given in Launch instances with On-Demand Capacity Reservations (ODCR)

Seeing `Found RunInstances parameters override.` in `clustermgtd.log` when cluster creation failed, or in `slurm_resume.log` when run job failed

Seeing `An error occurred (InsufficientInstanceCapacity)` in `slurm_resume.log` when I fail to a run job, or in `clustermgtd.log` when I fail to create a cluster

Using PG-ODCR (Placement Group ODCR)

Using zonal Reserved Instances

Did this page help you?

Next topic:

Previous topic:

Need help?

Select your cookie preferences

I configured on demand capacity reservations (ODCRs) or zonal Reserved Instances

ODCRs that include instances that have multiple network interfaces, such as P4d, P4de, and AWS Trainium (Trn)

ODCRs are targeted ODCRS

Seeing Unable to read file '/opt/slurm/etc/pcluster/run_instances_overrides.json'. even though I already have /opt/slurm/etc/pcluster/run_instances_overrides.json in place by following the instructions given in Launch instances with On-Demand Capacity Reservations (ODCR)

Seeing Found RunInstances parameters override. in clustermgtd.log when cluster creation failed, or in slurm_resume.log when run job failed

Seeing An error occurred (InsufficientInstanceCapacity) in slurm_resume.log when I fail to a run job, or in clustermgtd.log when I fail to create a cluster

Using PG-ODCR (Placement Group ODCR)

Using zonal Reserved Instances

Did this page help you?

Next topic:

Previous topic:

Need help?

Seeing `Unable to read file '/opt/slurm/etc/pcluster/run_instances_overrides.json'.` even though I already have `/opt/slurm/etc/pcluster/run_instances_overrides.json` in place by following the instructions given in Launch instances with On-Demand Capacity Reservations (ODCR)

Seeing `Found RunInstances parameters override.` in `clustermgtd.log` when cluster creation failed, or in `slurm_resume.log` when run job failed

Seeing `An error occurred (InsufficientInstanceCapacity)` in `slurm_resume.log` when I fail to a run job, or in `clustermgtd.log` when I fail to create a cluster