Compute Resource Memory Management - AWS Batch

Compute Resource Memory Management

When the Amazon ECS container agent registers a compute resource into a compute environment, the agent must determine how much memory the compute resource has available to reserve for your jobs. Because of platform memory overhead and memory occupied by the system kernel, this number is different than the installed memory amount for Amazon EC2 instances. For example, an m4.large instance has 8 GiB of installed memory. However, this doesn't always translate to exactly 8192 MiB of memory available for jobs when the compute resource registers.

Suppose that you specify 8192 MiB for the job, and none of your compute resources have 8192 MiB or greater of memory available to satisfy this requirement. Then, the job can't be placed in your compute environment. If you're using a managed compute environment, AWS Batch must launch a larger instance type to accommodate the request.

The default AWS Batch compute resource AMI also reserves 32 MiB of memory for the Amazon ECS container agent and other critical system processes. This memory isn't available for job allocation. For more information, see Reserving System Memory.

The Amazon ECS container agent uses the Docker ReadMemInfo() function to query the total memory available to the operating system. Linux provides command line utilities to determine the total memory.

Example - Determine Linux total memory

The free command returns the total memory that's recognized by the operating system.

$ free -b

The following is example output for an m4.large instance that's running the Amazon ECS-optimized Amazon Linux AMI.

total used free shared buffers cached Mem: 8373026816 348180480 8024846336 90112 25534464 205418496 -/+ buffers/cache: 117227520 8255799296

This instance has 8373026816 bytes of total memory. This means that there's 7985 MiB available for tasks.

Reserving System Memory

If you occupy all of the memory on a compute resource with your jobs, it's possible that your jobs contend with critical system processes for memory and possibly cause a system failure. The Amazon ECS container agent provides a configuration variable that's called ECS_RESERVED_MEMORY. You can use this configuration variable to remove a specified number of MiB of memory from the pool that's allocated to your jobs. This effectively reserves that memory for critical system processes.

The default AWS Batch compute resource AMI reserves 32 MiB of memory for the Amazon ECS container agent and other critical system processes.

Viewing Compute Resource Memory

You can view how much memory a compute resource registers with in the Amazon ECS console or with the DescribeContainerInstances API operation. If you're trying to maximize your resource utilization by providing your jobs as much memory as possible for a particular instance type, you can observe the memory available for that compute resource and then assign your jobs that much memory.

To view compute resource memory
  1. Open the console at https://console.aws.amazon.com/ecs/v2.

  2. Choose Clusters, and then choose the cluster that hosts your compute resources to view.

    The cluster name for your compute environment begins with your compute environment name.

  3. Choose Infrastructure.

  4. Under Container instances, choose the container instance.

  5. The Resources and networking section shows the registered and available memory for the compute resource.

    The Registered memory value is what the compute resource registered with Amazon ECS when it was first launched, and the Available memory value is what hasn't already been allocated to jobs.