PERF05-BP02 Evaluate available networking features
Evaluate networking features in the cloud that may increase performance. Measure the impact of these features through testing, metrics, and analysis. For example, take advantage of network-level features that are available to reduce latency, packet loss, or jitter.
Many services are created to improve performance and others commonly offer features to optimize network performance. Services such as AWS Global Accelerator and Amazon CloudFront exist to improve performance while most other services have product features to optimize network traffic. Review service features, such as EC2 instance network capability, enhanced networking instance types, Amazon EBS-optimized instances, Amazon S3 transfer acceleration, and CloudFront, to improve your workload performance.
Desired outcome: You have documented the inventory of components within your workload and have identified which networking configurations per component will help you meet your performance requirements. After evaluating the networking features, you have experimented and measured the performance metrics to identify how to use the features available to you.
Common anti-patterns:
-
You put all your workloads into an AWS Region closest to your headquarters instead of an AWS Region close to your end users.
-
Failing to benchmark your workload performance and continually evaluating your workload performance against that benchmark.
-
You do not review service configurations for performance improving options.
Benefits of establishing this best practice: Evaluating all service features and options can increase your workload performance, reduce the cost of infrastructure, decrease the effort required to maintain your workload, and increase your overall security posture. You can use the global AWS backbone to ensure that you provide the optimal networking experience for your customers.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Review which network-related configuration options are available to you, and how they could impact your workload. Understanding how these options interact with your architecture and the impact that they will have on both measured performance and the performance perceived by users is critical for performance optimization.
Implementation steps:
-
Create a list of workload components.
-
Build, manage and monitor your organizations network using AWS Cloud WAN
. -
Get visibility into your network using Network Manager. Use an existing configuration management database (CMDB) tool or a tool such as AWS Config
to create an inventory of your workload and how it’s configured.
-
-
If this is an existing workload, identify and document the benchmark for your performance metrics, focusing on the bottlenecks and areas to improve. Performance-related networking metrics will differ per workload based on business requirements and workload characteristics. As a start, these metrics might be important to review for your workload: bandwidth, latency, packet loss, jitter, and retransmits.
-
If this is a new workload, perform load tests
to identify performance bottlenecks. -
For the performance bottlenecks you identify, review the configuration options for your solutions to identify performance improvement opportunities.
-
If you don’t know your network path or routes, use Network Access Analyzer to identify them.
-
Review your network protocols to further reduce your latency.
-
If you are using an AWS Site-to-Site VPN across multiple locations to connect to an AWS Region, then review accelerated Site-to-Site VPN connections for opportunities to improve networking performance.
-
When your workload traffic is spread across multiple accounts, evaluate your network topology and services to reduce latency.
-
Evaluate your operational and performance tradeoffs between VPC Peering and AWS Transit Gateway
when connecting multiple accounts. AWS Transit Gateway supports an AWS Site-to-Site VPN throughput to scale beyond a single IPsec maximum limit by using multi-path. Traffic between an Amazon VPC and AWS Transit Gateway remains on the private AWS network and is not exposed to the internet. AWS Transit Gateway simplifies how you interconnect all of your VPCs, which can span across thousands of AWS accounts and into on-premises networks. Share your AWS Transit Gateway between multiple accounts using Resource Access Manager . To get visibility into your global network traffic, use Network Manager to get a central view of your network metrics.
-
-
Review your user locations and minimize the distance between your users and the workload.
-
AWS Global Accelerator
is a networking service that improves the performance of your users’ traffic by up to 60% using the Amazon Web Services global network infrastructure. When the internet is congested, AWS Global Accelerator optimizes the path to your application to keep packet loss, jitter, and latency consistently low. It also provides static IP addresses that simplify moving endpoints between Availability Zones or AWS Regions without needing to update your DNS configuration or change client-facing applications. -
Amazon CloudFront
can improve the performance of your workload content delivery and latency globally. CloudFront has over 410 globally dispersed points of presence that can cache your content and lower the latency to the end user. -
Amazon Route 53 offers latency-based routing, geolocation routing, geoproximity routing, and IP-based routing options to help you improve your workload’s performance for a global audience. Identify which routing option would optimize your workload performance by reviewing your workload traffic and user location.
-
-
Evaluate additional Amazon S3 features to improve storage IOPs.
-
Amazon S3 Transfer acceleration
is a feature that lets external users benefit from the networking optimizations of CloudFront to upload data to Amazon S3. This improves the ability to transfer large amounts of data from remote locations that don’t have dedicated connectivity to the AWS Cloud. -
Amazon S3 Multi-Region Access Points replicates content to multiple Regions and simplifies the workload by providing one access point. When a Multi-Region Access Point is used, you can request or write data to Amazon S3 with the service identifying the lowest latency bucket.
-
-
Review your compute resource network bandwidth.
-
Elastic Network Interfaces (ENA) used by EC2 instances, containers, and Lambda functions are limited on a per-flow basis. Review your placement groups to optimize your EC2 networking throughput. To avoid the bottleneck at the per flow-basis, design your application to use multiple flows. To monitor and get visibility into your compute related networking metrics, use CloudWatch Metrics and
ethtool
.ethtool
is included in the ENA driver and exposes additional network-related metrics that can be published as a custom metric to CloudWatch. -
Newer EC2 instances can leverage enhanced networking. N-series EC2 instances
, such as M5n
andM5dn
, take advantage of the fourth generation of custom Nitro cards to deliver up to 100 Gbps of network throughput to a single instance. These instances offer four times the network bandwidth and packet process compared to the baseM5
instances and are ideal for network intensive applications. -
Amazon Elastic Network Adapters (ENA) provide further optimization by delivering better throughput for your instances within a cluster placement group.
-
Elastic Fabric Adapter
(EFA) is a network interface for Amazon EC2 instances that enables you to run workloads requiring high levels of internode communications at scale on AWS. With EFA, High Performance Computing (HPC) applications using the Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs. -
Amazon EBS-optimized instances use an optimized configuration stack and provide additional, dedicated capacity to increase the Amazon EBS I/O. This optimization provides the best performance for your EBS volumes by minimizing contention between Amazon EBS I/O and other traffic from your instance.
-
Level of effort for the implementation plan:
To establish this best practice, you must be aware of your current workload component options that impact network performance. Gathering the components, evaluating network improvement options, experimenting, implementing, and documenting those improvements is a low to moderate level of effort.
Resources
Related documents:
Related videos:
Related examples: