Recover your instance
To automatically recover an instance when a system status check failure occurs, you can use the default configuration of the instance or create an Amazon CloudWatch alarm. If an instance becomes unreachable because of an underlying hardware failure or a problem that requires AWS involvement to repair, the instance is automatically recovered.
A recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata. If the impaired instance has a public IPv4 address, the instance retains the public IPv4 address after recovery. If the impaired instance is in a placement group, the recovered instance runs in the placement group. During instance recovery, the instance is migrated as part of an instance reboot, and any data that is in-memory is lost.
Examples of problems that require instance recovery:
-
Loss of network connectivity
-
Loss of system power
-
Software issues on the physical host
-
Hardware issues on the physical host that impact network reachability
Topics
Simplified automatic recovery based on instance configuration
Instances that support simplified automatic recovery are configured by default to recover a failed instance. The default configuration applies to new instances that you launch and existing instances that you previously launched. Simplified automatic recovery is initiated in response to system status check failures. Simplified automatic recovery doesn't take place during Service Health Dashboard events, or any other events that impact the underlying hardware. For more information, see Troubleshoot instance recovery failures.
When a simplified automatic recovery event succeeds, you are notified by an AWS Health Dashboard event. When a simplified automatic recovery event fails, you are notified by an AWS Health Dashboard event and by email. You can also use Amazon EventBridge rules to monitor for simplified automatic recovery events using the following event codes:
-
AWS_EC2_SIMPLIFIED_AUTO_RECOVERY_SUCCESS
— successful events -
AWS_EC2_SIMPLIFIED_AUTO_RECOVERY_FAILURE
— failed events
For more information, see Amazon EventBridge rules.
Requirements
Simplified automatic recovery is supported by an instance if the instance has the following characteristics:
-
It uses
default
ordedicated
instance tenancy. -
It does not use an Elastic Fabric Adapter.
-
It uses one of the following instance types:
-
General purpose: A1 | M3 | M4 | M5 | M5a | M5n | M5zn | M6a | M6g | M6i | M6in | M7a | M7g | M7i | M7i-flex | T1 | T2 | T3 | T3a | T4g
-
Compute optimized: C3 | C4 | C5 | C5a | C5n | C6a | C6g | C6gn | C6i | C6in | C7a | C7g | C7gn | C7i
-
Memory optimized: R3 | R4 | R5 | R5a | R5b | R5n | R6a | R6g | R6i | R6in | R7a | R7g | R7i | R7iz | u-3tb1 | u-6tb1 | u-9tb1 | u-12tb1 | u-18tb1 | u-24tb1 | X1 | X1e | X2iezn
-
Accelerated computing: G3 | G3s | G5g | Inf1 | P2 | P3 | VT1
-
High-performance computing Hpc6a | Hpc7a | Hpc7g
-
-
It does not have instance store volumes. If a Nitro instance type has instance store volumes, or if a Xen-based instance has mapped instance store volumes in the AMI being used, the instance can't be automatically recovered.
Important
If an instance has instance store volumes attached, stopping and starting the instance will cause any data on the instance store volumes to be lost. You should regularly backup your instance store volume data to more persistent storage, such as Amazon EBS, Amazon S3, or Amazon EFS. In the event of a system status check failure, you can stop and start instances with instance store volumes and then restore the instance store volumes using the backed-up data.
Limitations
-
Instances with instance store volumes and metal instance types are not supported by simplified automatic recovery.
-
Simplified automatic recovery is not initiated for instances in an Auto Scaling group. If your instance is part of an Auto Scaling group with health checks enabled, then the instance is replaced when it becomes impaired.
-
Simplified automatic recovery applies to unplanned events only. It does not apply to scheduled events.
-
Terminated or stopped instances can't be recovered.
Set the recovery behavior
You can set the automatic recovery behavior to disabled
or
default
during or after launching the instance. The default
configuration does not enable simplified automatic recovery for an unsupported
instance type.
Amazon CloudWatch action based recovery
Use Amazon CloudWatch action based recovery if you want to customize when to recover your instance.
When the StatusCheckFailed_System
alarm is triggered, and the recovery action
is initiated, you're notified by the Amazon SNS topic that you selected when you created the alarm
and associated the recovery action. When the recovery action is complete, information is
published to the Amazon SNS topic you configured for the alarm. Anyone who is subscribed to this
Amazon SNS topic receives an email notification that includes the status of the recovery attempt
and any further instructions. As a last step in the recovery action, the recovered instance
reboots.
You can use Amazon CloudWatch alarms to recover an instance even if simplified automatic recovery is not disabled. For information about creating an Amazon CloudWatch alarm to recover an instance, see Add recover actions to Amazon CloudWatch alarms.
Supported instance types
All of the instance types supported by simplified automatic recovery are also supported by Amazon CloudWatch action based recovery. Additionally, CloudWatch action based recovery supports bare metal variants of the supported instance types. The following instance families are also supported in addition to those supported by simplified automatic recovery:
-
Memory optimized: X2idn | X2iedn
Important
For supported instance types that have instance store volumes, any data on these volumes will be lost during a recovery. Stopping and starting the instance will also cause any data on the instance store volume to be lost. You should regularly backup your instance store volume data to more persistent storage, such as Amazon EBS, Amazon S3, or Amazon EFS. In the event of a system status check failure, you can stop and start instances with instance store volumes and then restore the instance store volumes using the backed-up data.
CloudWatch action based recovery does not support recovery for instances with Dedicated Host tenancy. For Amazon EC2 Dedicated Hosts, you can use Dedicated Host Auto Recovery to automatically recover unhealthy instances.
You can use the AWS Management Console or the AWS CLI to view the instance types that support CloudWatch action based recovery.
Troubleshoot instance recovery failures
The following issues can cause the recovery of your instance to fail:
-
During Service Health Dashboard events, simplified automatic recovery might not recover your instance. You might not receive recovery failure notifications for such events. Any ongoing Service Health Dashboard events might also prevent CloudWatch action based recovery from successfully recovering an instance. For the latest service availability information, see http://status.aws.amazon.com/
. -
Temporary, insufficient capacity of replacement hardware.
-
The instance has reached the maximum daily allowance of three recovery attempts.
The automatic recovery process attempts to recover your instance for up to three separate failures per day. If the instance system status check failure persists, we recommend that you manually stop and start the instance. Data on instance store volumes is lost when the instance is stopped. For more information, see Stop and start your instance.
Your instance might subsequently be retired if automatic recovery fails and a hardware degradation is determined to be the root cause for the original system status check failure.