Error: Replication not converging Replication lag increasing Error: Unknown data replication error Error: Failed to configure replication software Error: Failed to download replication software Error: Failed to establish communication with recovery instance Error: Failed to pair replication agent with replication server

Replication performance errors

This topic covers replication errors related to performance, convergence, and lag in AWS Elastic Disaster Recovery.

Topics

Error: Replication not converging
Replication lag increasing
Error: Unknown data replication error
Error: Failed to configure replication software
Error: Failed to download replication software
Error: Failed to establish communication with recovery instance
Error: Failed to pair replication agent with replication server

Error: Replication not converging

Error: NOT_CONVERGING status

Cause: The rate of data changes on the source server exceeds the available replication bandwidth. Elastic Disaster Recovery cannot catch up with the ongoing writes.

Resolution:

Additionally:

Calculate required bandwidth. For more information, see bandwidth requirements.
Run a bandwidth test. For more information, see network bandwidth test.

Note

Changing the staging disk type might affect replication costs. Review Amazon EBS pricing before making changes.

Replication lag increasing

Symptom: Replication lag grows over time or spikes unexpectedly.

Causes:

Source server is down or the agent is not running.
TCP port 1500 is blocked outbound from the source server to the replication server.
Source server MAC address changed. This requires agent reinstallation.
Source server recently rebooted or Elastic Disaster Recovery services restarted. Disks are re-read, and lag grows temporarily until the process completes.
Source server experienced a spike in write operations. Lag grows until Elastic Disaster Recovery flushes the backlog.
Insufficient bandwidth for the combined write throughput of all source servers.

Resolution:

Verify the agent is running and the source server is connected.
Verify TCP port 1500 connectivity from the source server to the replication server.
If the lag is temporary (post-reboot or write spike), wait for replication to converge.
If lag persists, check bandwidth. For more information, see bandwidth requirements.
Consider upgrading the staging disk type for higher throughput.

Error: Unknown data replication error

Error: unknown_error

Cause: An unclassified replication error. Multiple root causes are possible.

Resolution:

Check connectivity between the source server and the replication server.
Check AWS CloudTrail for API throttling errors.
Monitor replication server performance (CPU, memory, disk I/O) in Amazon CloudWatch.
Verify network bandwidth is adequate. For more information, see bandwidth requirements.
Check agent logs. For more information, see Agent logs and diagnostics.
If the error persists, contact AWS Support with the agent logs and source server ID.

Error: Failed to configure replication software

Error: FAILED_TO_CONFIGURE_REPLICATION_SOFTWARE

Cause: An internal error occurred during replication software configuration. This is typically transient.

Resolution: Retry the operation. If the error persists, contact AWS Support.

Error: Failed to download replication software

Error: FAILED_TO_DOWNLOAD_REPLICATION_SOFTWARE_TO_FAILBACK_CLIENT

Cause: The Failback Client cannot download replication software from Amazon S3. This indicates connectivity issues to the Amazon S3 endpoint, or a proxy or network security appliance is filtering traffic.

Resolution:

Verify connectivity to the Amazon S3 endpoint. For more information, see TCP port 443 troubleshooting.
Check for a proxy or network security appliance intercepting or blocking the download.
Retry the operation.

Error: Failed to establish communication with recovery instance

Error: FAILED_TO_ESTABLISH_RECOVERY_INSTANCE_COMMUNICATION

Cause: The Failback Client cannot communicate with the recovery instance.

Resolution:

If you use a public network (no VPN or ), ensure the recovery instance has a public IP address. By default, Elastic Disaster Recovery launch templates disable public IP assignment.
If you use a private network, verify routing between the Failback Client and the recovery instance.
Check that the security group on the recovery instance allows inbound traffic on the required ports.

Error: Failed to pair replication agent with replication server

Error: Failed to pair replication agent with replication server

Cause: The replication agent, replication server, and Elastic Disaster Recovery endpoint cannot establish a three-way communication channel.

Resolution:

Verify connectivity between the agent, the replication server, and the Elastic Disaster Recovery endpoint.
If the error persists, contact AWS Support.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Replication infrastructure errors

Bandwidth requirements