Updating compute environments - AWS Batch

Updating compute environments

After you create a compute environment that uses EC2 resources, you can update many of the settings of the compute environment directly. However, changing some of the settings requires that AWS Batch replace the instances in the compute environment.

For compute environments that use Fargate resources, you can update the following.

  • securityGroupIds

  • subnets

  • desiredvCpus

  • maxvCpus

  • minvCpus

AWS Batch has two update mechanisms. The first is a scaling update where instances are added or removed from the compute environment. The second is an infrastructure update where the instances in the compute environment are replaced. An infrastructure update takes much longer than a scaling update.

If you update compute environments with AWS Batch, changing only these settings causes a scaling update: desired vCPUs (desiredvCpus), maximum vCPUs (maxvCpus), minimum vCPUs (minvCpus), service role (serviceRole), and state (state).

Note

When you update the desiredvCpus setting, the value must be between the minvCpus and maxvCpus values.

Additionally, the updated desiredvCpus value must be greater than or equal to the current desiredvCpus value. For more information, see Error message when you update the desiredvCpus setting.

If any of the following settings are changed in an UpdateComputeEnvironment API action, AWS Batch initiates an infrastructure update. An infrastructure update requires that the service role is set to AWSServiceRoleForBatch (the default) and that the allocation strategy is BEST_FIT_PROGRESSIVE, SPOT_CAPACITY_OPTIMIZED, or SPOT_PRICE_CAPACITY_OPTIMIZED. BEST_FIT isn't supported. Except for service role, all of the settings that can be changed for a scaling update can also be changed for an infrastructure update.

Note

We recommend that you use SPOT_PRICE_CAPACITY_OPTIMIZED rather than SPOT_CAPACITY_OPTIMIZEDn in most instances.

During an infrastructure update, the status of the compute environment changes to UPDATING. New instances are launched using the updated settings. New jobs are scheduled on the new instances. Jobs that are currently running are dispatched according to the infrastructure update policy. For more information, see UpdateComputeEnvironment and UpdatePolicy in the AWS Batch API Reference.

In the UpdatePolicy data type, consider the following scenarios:

Note

In these scenarios, the following is true. When an instance is terminated, running jobs are stopped. By default, these jobs aren't retried. To retry one of these jobs after an instance is terminated, configure a job retry strategy. For more information, see Automated job retries in the AWS Batch User Guide.

  • If the terminateJobsOnUpdate setting is set to true, running jobs are terminated during an infrastructure update. The jobExecutionTimeoutMinutes setting is ignored.

  • If the terminateJobsOnUpdate setting is set to false, jobs can run for additional time after the infrastructure update occurs. This additional time is configured in the jobExecutionTimeoutMinutes setting. By default, the jobExecutionTimeoutMinutes setting is 30 minutes.

As capacity becomes available in the compute environment, new instances are launched with the updated settings and jobs are started on the new instances. As all of the jobs complete on instances with the old settings, the old instances are terminated. What capacity becoming available means is that desired number of vCPUs is below the maximum number of vCPUs by at least as many vCPUs as required by the smallest instance type.

Infrastructure updates

An infrastructure update is required to change some settings for a compute environment. If any of the following settings are changed, an infrastructure update is started:

Important

The compute environment must use the AWSServiceRoleForBatch service-linked role to make changes that require an infrastructure update.

If the compute environment uses a service-linked role, it can't be changed to use a regular IAM role. Likewise, if the compute environment has a regular IAM role, it can't be changed to use a service-linked role. Therefore, you can only perform infrastructure updates on compute environments that were created by using a service-linked role.

  • Allocation strategy (allocationStrategy, must be either BEST_FIT_PROGRESSIVE, SPOT_CAPACITY_OPTIMIZED, or SPOT_PRICE_CAPACITY_OPTIMIZED. If the original allocation strategy is BEST_FIT, infrastructure updates aren't supported.)

    Note

    We recommend that you use SPOT_PRICE_CAPACITY_OPTIMIZED rather than SPOT_CAPACITY_OPTIMIZEDn in most instances.

  • Bid percentage (bidPercentage)

  • EC2 configuration (ec2Configuration)

  • Key pair (ec2KeyPair)

  • Image ID (imageId)

  • Instance role (instanceRole)

  • Instance types (instanceTypes)

  • Launch template (launchTemplate)

  • Placement group (placementGroup)

  • Security groups (securityGroupIds)

  • VPC subnets (subnets)

  • EC2 tags (tags)

  • Compute environment type (type, can be one of EC2 or SPOT)

  • Whether to update to the latest AMI that's supported by AWS Batch during an infrastructure update updateToLatestImageVersion

Updating the AMI ID

During an infrastructure update, the compute environment's AMI ID might change, depending on whether AMIs are specified in any of these three settings. AMIs are specified in the imageId (in computeResources), imageIdOverride (in ec2Configuration), or the launch template specified in launchTemplate. Suppose that no AMI IDs are specified in any of those settings and the updateToLatestImageVersion setting is true. Then, the latest Amazon ECS optimized AMI supported by AWS Batch is used for any infrastructure update.

If an AMI ID is specified in at least one of these settings, the update depends on which setting provided the AMI ID used before the update. When you create a compute environment, the priority for selecting an AMI ID is first the launch template, then the imageId setting, and finally the imageIdOverride setting. However, if the AMI ID that's used came from the launch template, updating either the imageId or imageIdOverride settings doesn't update the AMI ID. The only way to update an AMI ID selected from the launch template is to update the launch template. If the version parameter of the launch template is $Default or $Latest, the default or latest version of the specified launch template is evaluated. If a different AMI ID is selected by the default or the latest version of the launch template is selected, that AMI ID is used in the update.

If the launch template was not used to select the AMI ID, the AMI ID that's specified in the imageId or imageIdOverride parameters is used. If both are specified, the AMI ID specified in the imageIdOverride parameter is used.

Suppose that the compute environment uses an AMI ID specified by the imageId, imageIdOverride, or launchTemplate parameters, and you want to use the latest Amazon ECS optimized AMI supported by AWS Batch. Then, the update must remove the settings that provided AMI IDs. For imageId, this requires specifying an empty string for that parameter. For imageIdOverride, this requires specifying an empty string for the ec2Configuration parameter.

If the AMI ID came from the launch template, you can change to the latest Amazon ECS optimized AMI that's supported by AWS Batch by either one of the following ways:

  • Remove the launch template by specifying an empty string for the launchTemplateId or launchTemplateName parameter. This removes the entire launch template, rather than the AMI ID alone.

  • If the updated version of the launch template doesn't specify an AMI ID, the updateToLatestImageVersion parameter must be set to true.