Compute Architecture Selection - Performance Efficiency Pillar

Compute Architecture Selection

The optimal compute choice for a particular workload can vary based on application design, usage patterns, and configuration settings. Architectures may use different compute choices for various components and enable different features to improve performance. Selecting the wrong compute choice for an architecture can lead to lower performance efficiency.

Evaluate the available compute options: Understand the performance characteristics of the compute-related options available to you. Know how instances, containers, and functions work, and what advantages, or disadvantages, they bring to your workload.

In AWS, compute is available in three forms: instances, containers, and functions.

Instances

Instances are virtualized servers, allowing you to change their capabilities with a button or an API call. Because resource decisions in the cloud aren’t fixed, you can experiment with different server types. At AWS, these virtual server instances come in different families and sizes, and they offer a wide variety of capabilities, including solid-state drives (SSDs) and graphics processing units (GPUs).

Amazon Elastic Compute Cloud (Amazon EC2) virtual server instances come in different families and sizes. They offer a wide variety of capabilities, including solid-state drives (SSDs) and graphics processing units (GPUs). When you launch an EC2 instance, the instance type that you specify determines the hardware of the host computer used for your instance. Each instance type offers different compute, memory, and storage capabilities. Instance types are grouped in instance families based on these capabilities.

Use data to select the optimal EC2 instance type for your workload, ensure that you have the correct networking and storage options, and consider operating system settings that can improve the performance for your workload.

Containers

Containers are a method of operating system virtualization that allow you to run an application and its dependencies in resource-isolated processes.

When running containers on AWS, you have two choices to make. First, choose whether or not you want to manage servers. AWS Fargate is serverless compute for containers, or Amazon EC2 can be used if you need control over the installation, configuration, and management of your compute environment. Second, choose which container orchestrator to use: Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service that allows you to automatically execute and manage containers on a cluster of EC2 instances or serverless instances using AWS Fargate. You can natively integrate Amazon ECS with other services such as Amazon Route 53, Secrets Manager, AWS Identity and Access Management (IAM), and Amazon CloudWatch.

Amazon Elastic Kubernetes Service (Amazon EKS) is a fully managed Kubernetes service. You can choose to run your EKS clusters using AWS Fargate, removing the need to provision and manage servers. EKS is deeply integrated with services such as Amazon CloudWatch, Auto Scaling Groups, AWS Identity and Access Management (IAM), and Amazon Virtual Private Cloud (VPC).

When using containers, you must use data to select the optimal type for your workload — just as you use data to select your EC2 or AWS Fargate instance types. Consider container configuration options such as memory, CPU, and tenancy configuration. To enable network access between container services, consider using a service mesh such as AWS App Mesh, which standardizes how your services communicate. Service mesh gives you end-to-end visibility and ensures high-availability for your applications.

Functions

Functions abstract the execution environment from the code you want to execute. For example, AWS Lambda allows you to execute code without running an instance.

You can use AWS Lambda to run code for any type of application or backend service with zero administration. Simply upload your code, and AWS Lambda will manage everything required to run and scale that code. You can set up your code to automatically trigger from other AWS services, call it directly, or use it with Amazon API Gateway.

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. You can create an API that acts as a “front door” to your Lambda function. API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

To deliver optimal performance with AWS Lambda, choose the amount of memory you want for your function. You are allocated proportional CPU power and other resources. For example, choosing 256 MB of memory allocates approximately twice as much CPU power to your Lambda function as requesting 128 MB of memory. You can control the amount of time each function is allowed to run (up to a maximum of 900 seconds).

Understand the available compute configuration options: Understand how various options complement your workload, and which configuration options are best for your system. Examples of these options include instance family, sizes, features (GPU, I/O), function sizes, container instances, and single versus multi-tenancy.

When selecting instance families and types, you must also consider the configuration options available to meet your workload’s needs:

  • Graphics Processing Units (GPU) Using general purpose computing on GPUs (GPGPU), you can build applications that benefit from the high degree of parallelism that GPUs provide by leveraging platforms (such as CUDA) in the development process. If your workload requires 3D rendering or video compression, GPUs enable hardware-accelerated computation and encoding, making your workload more efficient.

  • Field Programmable Gate Arrays (FPGA) Using FPGAs, you can optimize your workloads by having custom hardware-accelerated execution for your most demanding workloads. You can define your algorithms by leveraging supported general programming languages such as C or Go, or hardware-oriented languages such as Verilog or VHDL.

  • AWS Inferentia (Inf1) Inf1 instances are built to support machine learning inference applications. Using Inf1 instances, customers can run large scale machine learning inference applications like image recognition, speech recognition, natural language processing, personalization, and fraud detection. You can build a model in one of the popular machine learning frameworks such as TensorFlow, PyTorch, or MXNet and use GPU instances such as P3 or P3dn to train your model. After your machine learning model is trained to meet your requirements, you can deploy your model on Inf1 instances by using AWS Neuron, a specialized software development kit (SDK) consisting of a compiler, run-time, and profiling tools that optimize the machine learning inference performance of Inferentia chips.

  • Burstable instance families Burstable instances are designed to provide moderate baseline performance and the capability to burst to significantly higher performance when required by your workload. These instances are intended for workloads that do not use the full CPU often or consistently, but occasionally need to burst. They are well suited for general-purpose workloads, such as web servers, developer environments, and small databases. These instances provide CPU credits that can be consumed when the instance must provide performance. Credits accumulate when the instance doesn’t need them.

  • Advanced computing features — Amazon EC2 gives you access to advanced computing features, such as managing C-state and P-state registers and controlling turbo-boost of processors. Access to co- processors allows cryptography operations offloading through AES-NI, or advanced computation through AVX extensions.

The AWS Nitro System is a combination of dedicated hardware and lightweight hypervisor enabling faster innovation and enhanced security. Utilize AWS Nitro Systems when available to enable full consumption of the compute and memory resources of the host hardware. Additionally, dedicated Nitro Cards enable high speed networking, high speed EBS, and I/O acceleration.

Collect compute-related metrics: One of the best ways to understand how your compute systems are performing is to record and track the true utilization of various resources. This data can be used to make more accurate determinations about resource requirements.

Workloads (such as those running on microservices architectures) can generate large volumes of data in the form of metrics, logs, and events. Determine if your existing monitoring and observability service can manage the data generated. Amazon CloudWatch can be used to collect, access, and correlate this data on a single platform from across all your AWS resources, applications, and services running on AWS and on-premises servers, so you can easily gain system-wide visibility and quickly resolve issues.

Determine the required configuration by right-sizing: Analyze the various performance characteristics of your workload and how these characteristics relate to memory, network, and CPU usage. Use this data to choose resources that best match your workload's profile. For example, a memory-intensive workload, such as a database, could be served best by the r-family of instances. However, a bursting workload can benefit more from an elastic container system.

Use the available elasticity of resources: The cloud provides the flexibility to expand or reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Combined with compute-related metrics, a workload can automatically respond to changes and utilize the optimal set of resources to achieve its goal.

Optimally matching supply to demand delivers the lowest cost for a workload, but you also must plan for sufficient supply to allow for provisioning time and individual resource failures. Demand can be fixed or variable, requiring metrics and automation to ensure that management does not become a burdensome and disproportionately large cost.

With AWS, you can use a number of different approaches to match supply with demand. The Cost Optimization Pillar whitepaper describes how to use the following approaches to cost:

  • Demand-based approach

  • Buffer-based approach

  • Time-based approach

You must ensure that workload deployments can handle both scale-up and scale-down events. Create test scenarios for scale-down events to ensure that the workload behaves as expected.

Re-evaluate compute needs based on metrics: Use system-level metrics to identify the behavior and requirements of your workload over time. Evaluate your workload's needs by comparing the available resources with these requirements and make changes to your compute environment to best match your workload's profile. For example, over time a system might be observed to be more memory-intensive than initially thought, so moving to a different instance family or size could improve both performance and efficiency.