Compute Architecture Selection
The optimal compute choice for a particular workload can vary based on application design, usage patterns, and configuration settings. Architectures may use different compute choices for various components and enable different features to improve performance. Selecting the wrong compute choice for an architecture can lead to lower performance efficiency.
Evaluate the available compute options: Understand the performance characteristics of the compute-related options available to you. Know how instances, containers, and functions work, and what advantages, or disadvantages, they bring to your workload.
In AWS, compute is available in three forms: instances, containers, and functions.
Instances
Instances are virtualized servers, allowing you to change their capabilities with a button or an API call. Because resource decisions in the cloud aren’t fixed, you can experiment with different server types. At AWS, these virtual server instances come in different families and sizes, and they offer a wide variety of capabilities, including solid-state drives (SSDs) and graphics processing units (GPUs).
Amazon Elastic Compute Cloud (Amazon EC2)
Use data to select the optimal EC2 instance type for your workload, ensure that you have the correct networking and storage options, and consider operating system settings that can improve the performance for your workload.
Containers
Containers are a method of operating system virtualization that allow you to run an application and its dependencies in resource-isolated processes.
When running containers on AWS, you have two choices to make. First, choose whether or
not you want to manage servers. AWS Fargate
Amazon Elastic Container Service (Amazon ECS)
Amazon Elastic Kubernetes Service (Amazon EKS)
When using containers, you must use data to select the optimal type for your workload —
just as you use data to select your EC2 or AWS Fargate instance types. Consider container
configuration options such as memory, CPU, and tenancy configuration. To enable network
access between container services, consider using a service mesh such as AWS App Mesh
Functions
Functions abstract the execution environment from the code you want to execute. For example, AWS Lambda allows you to execute code without running an instance.
You can use AWS Lambda
Amazon API Gateway
To deliver optimal performance with AWS Lambda, choose the amount of memory you want for your function. You are allocated proportional CPU power and other resources. For example, choosing 256 MB of memory allocates approximately twice as much CPU power to your Lambda function as requesting 128 MB of memory. You can control the amount of time each function is allowed to run (up to a maximum of 900 seconds).
Understand the available compute configuration options: Understand how various options complement your workload, and which configuration options are best for your system. Examples of these options include instance family, sizes, features (GPU, I/O), function sizes, container instances, and single versus multi-tenancy.
When selecting instance families and types, you must also consider the configuration options available to meet your workload’s needs:
-
Graphics Processing Units (GPU)
— Using general purpose computing on GPUs (GPGPU), you can build applications that benefit from the high degree of parallelism that GPUs provide by leveraging platforms (such as CUDA) in the development process. If your workload requires 3D rendering or video compression, GPUs enable hardware-accelerated computation and encoding, making your workload more efficient. -
Field Programmable Gate Arrays (FPGA)
— Using FPGAs, you can optimize your workloads by having custom hardware-accelerated execution for your most demanding workloads. You can define your algorithms by leveraging supported general programming languages such as C or Go, or hardware-oriented languages such as Verilog or VHDL. -
AWS Inferentia (Inf1)
— Inf1 instances are built to support machine learning inference applications. Using Inf1 instances, customers can run large scale machine learning inference applications like image recognition, speech recognition, natural language processing, personalization, and fraud detection. You can build a model in one of the popular machine learning frameworks such as TensorFlow, PyTorch, or MXNet and use GPU instances such as P3 or P3dn to train your model. After your machine learning model is trained to meet your requirements, you can deploy your model on Inf1 instances by using AWS Neuron , a specialized software development kit (SDK) consisting of a compiler, run-time, and profiling tools that optimize the machine learning inference performance of Inferentia chips. -
Burstable instance families
— Burstable instances are designed to provide moderate baseline performance and the capability to burst to significantly higher performance when required by your workload. These instances are intended for workloads that do not use the full CPU often or consistently, but occasionally need to burst. They are well suited for general-purpose workloads, such as web servers, developer environments, and small databases. These instances provide CPU credits that can be consumed when the instance must provide performance. Credits accumulate when the instance doesn’t need them. -
Advanced computing features — Amazon EC2 gives you access to advanced computing features, such as managing C-state and P-state registers and controlling turbo-boost of processors. Access to co- processors allows cryptography operations offloading through AES-NI, or advanced computation through AVX extensions.
The AWS Nitro System
Collect compute-related metrics: One of the best ways to understand how your compute systems are performing is to record and track the true utilization of various resources. This data can be used to make more accurate determinations about resource requirements.
Workloads (such as those running on microservices architectures) can generate large volumes of data in the form of metrics, logs, and events. Determine if your existing monitoring and observability service can manage the data generated. Amazon CloudWatch can be used to collect, access, and correlate this data on a single platform from across all your AWS resources, applications, and services running on AWS and on-premises servers, so you can easily gain system-wide visibility and quickly resolve issues.
Determine the required configuration by right-sizing: Analyze the various performance characteristics of your workload and how these characteristics relate to memory, network, and CPU usage. Use this data to choose resources that best match your workload's profile. For example, a memory-intensive workload, such as a database, could be served best by the r-family of instances. However, a bursting workload can benefit more from an elastic container system.
Use the available elasticity of resources: The cloud provides the flexibility to expand or reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Combined with compute-related metrics, a workload can automatically respond to changes and utilize the optimal set of resources to achieve its goal.
Optimally matching supply to demand delivers the lowest cost for a workload, but you also must plan for sufficient supply to allow for provisioning time and individual resource failures. Demand can be fixed or variable, requiring metrics and automation to ensure that management does not become a burdensome and disproportionately large cost.
With AWS, you can use a number of different approaches to match supply with demand. The Cost Optimization Pillar whitepaper describes how to use the following approaches to cost:
-
Demand-based approach
-
Buffer-based approach
-
Time-based approach
You must ensure that workload deployments can handle both scale-up and scale-down events. Create test scenarios for scale-down events to ensure that the workload behaves as expected.
Re-evaluate compute needs based on metrics: Use system-level metrics to identify the behavior and requirements of your workload over time. Evaluate your workload's needs by comparing the available resources with these requirements and make changes to your compute environment to best match your workload's profile. For example, over time a system might be observed to be more memory-intensive than initially thought, so moving to a different instance family or size could improve both performance and efficiency.