Passive communications design - The Security Design of the AWS Nitro System

Passive communications design

The AWS Nitro System follows a “passive communication” design principle. What that means is that during production operation, components of the system never initiate outbound communication including to any control plane, management, or cloud service. Instead, there is a single hardened trusted service listening on the network, they listen for commands on the network or system bus, take action based on those commands, and then return results—all through well-defined APIs with scoped-down access controls. Both sides of these communication paths also perform parameter validation to ensure that only valid parameters are sent and received.

This pattern begins with the hypervisor itself. It waits for commands from the Nitro Controller on a private channel over PCIe. It never initiates outbound communication with the rest of the Nitro System. It cannot initiate any outbound network connections because, as noted, it has no networking stack. If at any time via an unlikely series of actions, the Nitro Hypervisor should attempt to initiate communications with other Nitro System components, this occurrence would be a clear signal of a firmware flaw or possible system compromise, and the EC2 service is designed to react accordingly to prevent impact and alarm for operator response.

Note

In bare metal mode, there is no hypervisor running on the server processor to await instructions from the Nitro Controller to start, stop, or reset the host server. In that case, the Nitro Controller controls the main board through its private BMC connection and the Nitro Security Chip.

The passive communications model repeats at the next layer out in the Nitro System. The Nitro Controller listens on a secure network channel awaiting authenticated and authorized commands in the form of specific APIs invoked by the EC2 control plane. The Nitro Controller never initiates any outbound communications on the EC2 control plane network. Even logical “push” features such as publishing CloudWatch metrics for the EC2 instances running on the host, or sending off the Nitro API logs to the EC2 control plane are implemented as a “pull” process. The control plane polls the Nitro Controller periodically to retrieve the metrics using well-defined APIs. Any attempt at outbound communication from the Nitro Controller would be clear signal of a firmware bug or possible system compromise, for which the EC2 service is designed to react accordingly to prevent impact and alarm for operator response.

The net effect of the passive communications model is a high degree of isolation and safety. Because normal operation involves listening only for well-defined, parameter-validated messages and responding to them using well-defined, parameter-validated responses, the system is designed to clearly identify and alert on potential aberrant activity. The system is designed such that, in the unlikely event of a firmware bug on the system main board, it would be highly likely that a potential adversary attempting to escape from the system mainboard outwards to the Nitro Cards would be detected, blocked, and alarmed. Moreover, even in the extremely unlikely case that a potential adversary in the former case were able to escape the system mainboard and somehow gain access to the Nitro Cards, the Nitro System design again makes it highly likely that any attempt by that adversary to escape from the Nitro Cards would be detected, blocked, and alarmed for the very same reason. These multiple layers of defense protect not only the EC2 service itself, but also all customers running workloads inside the EC2 system.