Centralized egress to internet - Building a Scalable and Secure Multi-VPC AWS Network Infrastructure

Centralized egress to internet

As you deploy applications in your Landing Zone, many apps will require outbound only internet access (for example, downloading libraries, patches, or OS updates). You can achieve this (preferable) by using a network address translation (NAT) gateway, or alternatively, an Amazon EC2 NAT instance as the next hop for all egress internet access. Internal applications reside in private subnets, while NAT gateway/EC2 NAT instances reside in a public subnet. AWS recommends that you use NAT gateways because they provide better availability and bandwidth and require less effort on your part to administer. For more information, refer to Compare NAT gateways and NAT instances.

Using the NAT gateway for centralized egress

NAT gateway is a managed network address translation service. Deploying a NAT gateway in every spoke VPC can become cost prohibitive because you pay an hourly charge for every NAT gateway you deploy (refer to Amazon VPC pricing), so centralizing it could be a viable option. To centralize, you create a separate egress VPC in the network services account and route all egress traffic from the spoke VPCs via a NAT gateway sitting in this VPC using Transit Gateway, as shown in the following figure.

Note

When you centralize NAT gateway using Transit Gateway, you pay an extra Transit Gateway data processing charge — compared to the decentralized approach of running a NAT gateway in every VPC. In some edge cases when you send huge amounts of data through NAT gateway from a VPC, keeping the NAT local in the VPC to avoid the Transit Gateway data processing charge might be a more cost-effective option.


        A diagram depicting a centralized NAT gateway using Transit Gateway (overview)

Centralized NAT gateway using Transit Gateway (overview)


        A diagram depicting a centralized NAT gateway using Transit Gateway (route table design)(overview)

Centralized NAT gateway using Transit Gateway (route table design)

In this setup, spoke VPC attachments are associated with Route Table 1 (RT1) and are propagated to Route Table 2 (RT2). There is a Blackhole route to disallow the two VPCs from communicating with each other. If you want to allow inter-VPC communication, you can remove the ‘10.0.0.0/8 -> Blackhole’ route entry from RT1. This allows them to communicate via the NAT gateway. You can also propagate the spoke VPC attachments to RT1 (or alternatively, you can use one route table and associate/propagate everything to that), enabling direct traffic flow between the VPCs using Transit Gateway.

You add a static route in RT1 pointing all traffic to egress VPC. Because of this static route, Transit Gateway sends all internet traffic through its ENIs in the egress VPC. Once in the egress VPC, traffic follows the routes defined in the subnet route table where these Transit Gateway ENIs are present. You add a route in subnet route tables pointing all traffic towards the respective NAT gateway in the same Availability Zone to minimize cross-Availability Zone (AZ) traffic. The NAT gateway subnet route table has internet gateway (IGW) as the next hop. For return traffic to flow back, you must add a static route table entry in the NAT gateway subnet route table pointing all spoke VPC bound traffic to Transit Gateway as the next hop.

High availability

For high availability, you should use two NAT gateways (one in each Availability Zone). Within an Availability Zone, the NAT gateway has an availability SLA of 99.9%. Redundancy against component failure within an AZ is handled by AWS under the SLA agreement. Traffic is dropped during the 0.1% time when the NAT gateway may be unavailable in an Avalabilitiy Zone. If one Avalabilitiy Zone entirely fails, the Transit Gateway endpoint along with NAT gateway in that Avalabilitiy Zone will fail, and all traffic will flow via the Transit Gateway and NAT gateway endpoints in the other Avalabilitiy Zone.

Security

You rely on security groups on the source instances, blackhole routes in the Transit Gateway route tables, and the network ACL of the subnet in which the NAT gateway is located.

Scalability

A NAT gateway can support up to 55,000 simultaneous connections to each unique destination. From a throughput standpoint, NAT gateway can scale from five Gbps to 45 Gbps. Transit Gateway generally does not act as a load balancer and would not distribute your traffic evenly across NAT gateway in the multiple Availability Zones. The traffic across the Transit Gateway will stay within an Availability Zone, if possible. If the Amazon EC2 instance initiating traffic is in Availability Zone 1, traffic will flow out of the Transit Gateway elastic network interface in the same Availability Zone 1 in the egress VPC and will flow to the next hop based on that subnet route table that elastic network interface resides in. However, to distribute the traffic evenly across both the Availability Zones, you can enable Appliance Mode on the VPC attachment connected to Egress VPC. For a complete list of rules, refer to NAT gateways in the Amazon Virtual Private Cloud documentation.

For more information, refer to the Creating a single internet exit point from multiple VPCs Using AWS Transit Gateway blog post.