Networking - AWS Outposts High Availability Design and Architecture Considerations

Networking

An Outpost deployment depends on a resilient connection to its anchor AZ for management, monitoring, and service operations to function properly. You should provision your on-premises network to provide redundant network connections for each Outpost rack and reliable connectivity back to the anchor points in the AWS cloud. Also consider network paths between the application workloads running on the Outpost and the other on-premises and cloud systems they communicate with – how will you route this traffic in your network?

Network attachment

Each AWS Outposts rack is configured with redundant top-of-rack switches called Outpost Networking Devices (ONDs). The compute and storage servers in each rack connect to both ONDs. You should connect each OND to a separate switch called a Customer Networking Device (CND) in your data center to provide diverse physical and logical paths for each Outpost rack. ONDs connect to your CNDs with one or more physical connections using fiber optic cables and optical transceivers. The physical connections are configured in logical link aggregation group (LAG) links.


          Diagram showing Multi-rack Outpost with redundant network attachments

Multi-rack Outpost with redundant network attachments

The OND to CND links are always configured in a LAG – even if the physical connection is a single fiber optic cable. Configuring the links as LAG groups allow you to increase the link bandwidth by adding additional physical connections to the logical group. The LAG links are configured as IEEE 802.1q Ethernet trunks to enable segregated networking between the Outpost and the on-premises network.

Every Outpost has at least two logically segregated networks that need to communicate with or across the customer network:

  • Service Link network – allocates Service Link IP addresses to the Outpost servers and facilitates communication with the on-premises network to allow the servers to connect back to the Outpost anchor points in the Region.

  • Local Gateway network – enables communication between the VPC subnets on the Outpost and the on-premises network via the Outpost Local Gateway (LGW).

These segregated networks attach to the on-premises network by a set of point-to-point IP connections over the LAG links. Each OND to CND LAG link is configured with VLAN IDs, point-to-point (/30 or /31) IP subnets, and eBGP peering for each segregated network (Service Link and LGW). You should consider the LAG links, with their point-to-point VLANs and subnets, as layer-2 segmented, routed layer-3 connections. The routed IP connections provide redundant logical paths that facilitate communication between the segregated networks on the Outpost and the on-premises network.


          Diagram showing Service Link peering

Service Link peering


          Diagram showing Local Gateway peering

Local Gateway peering

You should terminate the layer-2 LAG links (and their VLANs) on the directly attached CND switches and configure the IP interfaces and BGP peering on the CND switches. You should not bridge the LAG VLANs between your data center switches. For more information, see Network layer connectivity in the AWS Outposts User Guide.

Inside a logical multi-rack Outpost, the ONDs are redundantly interconnected to provide highly available network connectivity between the racks and the workloads running on the servers. AWS is responsible for network availability within the Outpost.

  • Connect each Outpost Networking Device (OND) in an Outpost rack to a separate Customer Networking Device (CND) in the data center.

  • Terminate the layer-2 links, VLANs, layer-3 IP subnets, and BGP peering on the directly attached Customer Networking Device (CND) switches. Do not bridge the OND to CND VLANs between the CNDs or across the on-premises network.

  • Add links to the Link Aggregation Groups (LAGs) to increase the available bandwidth between the Outpost and the data center. Do not rely on the aggregate bandwidth of the diverse paths through both ONDs.

  • Use the diverse paths through the redundant ONDs to provide resilient connectivity between the Outpost networks and the on-premises network.

Anchor connectivity

An Outpost Service Link connects to either public or private anchors (not both) in a specific Availability Zone (AZ) in the Outpost’s parent Region. Outpost servers initiate outbound Service Link VPN connections from their Service Link IP addresses to the anchor points in the anchor AZ. These connections use UDP and TCP port 443. AWS is responsible for the availability of the anchor points in the Region.

You must ensure the Outpost Service Link IP addresses can connect through your network to the anchor points in the anchor AZ. The Service Link IP addresses do not need to communicate with other hosts on your on-premises network.

Public anchor points reside in the Region’s public IP ranges (in the EC2 service CIDR blocks) and may be accessed via the internet or AWS Direct Connect (DX) public virtual interfaces (VIFs). The use of public anchor points allows for more flexible path selection as Service Link traffic may be routed over any available path that can successfully reach the anchor points on the public internet.

Private anchor points allow you to use your IP address ranges for anchor connectivity. Private anchor points are created in a private subnet within a dedicated VPC using customer-assigned IP addresses. The VPC is created in the AWS account that owns the Outpost resource and you are responsible for ensuring the VPC is available and properly configured (don’t delete it!). Private anchor points must be accessed using Direct Connect private VIFs.

You should provision redundant network paths between the Outpost and the anchor points in the Region with connections terminating on separate devices in more than one location. Dynamic routing should be configured to automatically reroute traffic to alternate paths when connections or networking devices fail. You should provision sufficient network capacity to ensure that the failure of one WAN path does not overwhelm the remaining paths.

The following diagram shows three Outposts with redundant network paths to their anchor AZs using AWS Direct Connect as well as public internet connectivity. Outpost A and Outpost B are anchored to different Availability Zones in the same Region. Outpost A connects to private anchor points in AZ 1 of region 1. Outpost B connects to public anchor points in AZ 2 of region 1. Outpost C connects to public anchors in AZ 1 of region 2.


          Diagram showing Highly available anchor connectivity with AWS Direct Connect and public
            internet access

Highly available anchor connectivity with AWS Direct Connect and public internet access

Outpost A has three redundant network paths to reach its private anchor point. Two paths are available through redundant Direct Connect circuits at a single Direct Connect location. The third path is available through a Direct Connect circuit at a second Direct Connect location. This design keeps Outpost A’s Service Link traffic on private networks and provides path redundancy that allows for failure of any one of the Direct Connect circuits or failure of an entire Direct Connect location.

Outpost B has four redundant network paths to reach its public anchor point. Three paths are available through public VIFs provisioned on the Direct Connect circuits and locations used by Outpost A. The fourth path is available through the customer WAN and the public internet. Outpost B’s Service Link traffic may be routed over any available path that can successfully reach the anchor points on the public internet. Using the Direct Connect paths may provide more consistent latency and higher bandwidth availability, while the public internet path may be used for Disaster Recovery (DR) or bandwidth augmentation scenarios.

Outpost C has two redundant network paths to reach its public anchor point. Outpost C is deployed in a different data center than Outposts A and B. Outpost C’s data center does not have dedicated circuits connecting to the customer WAN. Instead, the data center has redundant internet connections provided by two different Internet Service Providers (ISPs). Outpost C’s Service Link traffic may be routed over either of the ISP networks to reach the anchor points on the public internet. This design allows flexibility to route Service Link traffic over any available public internet connection. However, the end-to-end path is dependent on public third-party networks where bandwidth availability and network latency fluctuate.

The network path between an Outpost and its Service Link anchor points must meet the following bandwidth and latency specifications:

  • 500 Mbps - 1 Gbps of available bandwidth per Outpost rack (for example, 3 racks: 1.5 – 3 Gbps available bandwidth)

  • Less than 300 milliseconds (round-trip) latency

  • Provision redundant network paths between each Outpost and its anchor points in the Region.

  • Use Direct Connect (DX) paths to control latency and bandwidth availability.

  • Ensure that TCP and UDP port 443 are open (outbound) from the Outpost Service Link CIDR blocks to the EC2 IP address ranges in the parent Region. Ensure the ports are open on all network paths.

  • Ensure each path meets the bandwidth availability and latency requirements.

  • Use dynamic routing to automate traffic redirection around network failures.

  • Test routing the Service Link traffic over each planned network path to ensure the path functions as expected.

Application/workload routing

There are two paths out of the Outpost for application workloads:

  • The Service Link path

  • The Local Gateway (LGW) path

You configure the Outpost subnet route tables to control which path to take to reach destination networks. Routes pointed to the LGW will direct traffic out the Local Gateway and to the on-premises network. Routes pointed to targets in the Region like Internet Gateways, NAT Gateways, Virtual Private Gateways, and VPC peering connections will direct traffic across the Service Link to reach these targets.


          Diagram showing a visualization of the Outpost Service Link and LGW network
            paths

Visualization of the Outpost Service Link and LGW network paths

You should take care when planning application routing to consider both normal operation and limited routing and service availability during network failures. The Service Link path is not available when an Outpost is disconnected from the Region.

You should provision diverse paths and configure dynamic routing between the Outpost LGW and your critical on-premises applications, systems and users. Redundant network paths allow the network to route traffic around failures and ensure that on-premises resources will be able to communicate with workloads running on the Outpost during partial network failures.

Outpost VPC route configurations are static. You configure subnet routing tables through the AWS Management Console, CLI, APIs, and other Infrastructure as Code (IaC) tools; however, you will not be able modify the subnet routing tables during a disconnect event. You will have to reestablish connectivity between the Outpost and the Region to update the route tables. Use the same routes for normal operations as you plan to use during disconnect events.

Resources on the Outpost can reach the internet via the Service Link and an Internet Gateway (IGW) in the Region or via the Local Gateway (LGW) path. Routing internet traffic over the LGW path and the on-premises network allows you to use existing on-premises internet ingress/egress points and may provide lower latency, higher MTUs, and reduced AWS data egress charges when compared to using the Service Link path to an IGW in the Region.

If your application must run on-premises and it needs to be accessible from the public internet, you should route the application traffic over your on-premises internet connection(s) to the LGW to reach the resources on the Outpost.

While you can configure subnets on an Outpost like public subnets in the Region, this may be an undesirable practice for most use cases. Inbound internet traffic will come in through the AWS Region and be routed over the Service Link to the resources running on the Outpost.

The response traffic will in turn be routed over the Service Link and back out through the AWS Region’s internet connections. This traffic pattern may add latency and will incur data egress charges as traffic leaves the Region on its way to the Outpost and as return traffic comes back through the Region and egresses out to the internet. If your application can run in the Region, the Region is the best place to run it.

Traffic between VPC resources (in the same VPC) will always follow the local VPC CIDR route and be routed between subnets by the implicit VPC routers.

For example, traffic between an EC2 instance running on the Outpost and a VPC Endpoint in the Region will always be routed over the Service Link.


          Diagram showing local VPC routing through the implicit routers

Local VPC routing through the implicit routers

  • Use the Local Gateway (LGW) path instead of the Service Link path where possible.

  • Route internet traffic over the LGW path.

  • Configure the Outpost subnet routing tables with a standard set of routes – they will be used for both normal operations and during disconnect events.

  • Provision redundant network paths between the Outpost LGW and critical on-premises application resources. Use dynamic routing to automate traffic redirection around on-premises network failures.