Troubleshooting
This section provides known issue resolution when deploying the solution. If these instructions don’t address your issue, see the Contact AWS Support section for instructions on opening an AWS Support case for this solution.
Problem: Requests against the target cluster are failing
If the analytics dashboard shows that requests (that succeeded against the source cluster) are failing when replayed against the target cluster, this might be caused due to several issues.
Resolution
First, check the specific HTTP codes being returned by looking at the status code metrics in the analytics cluster.
-
403 error - A 403 error is "Unauthorized" and indicates that something is wrong with the configuration—either on the target cluster side or the replayer side with how authorization is configured. Ensure that the authorization strategy selected for each side matches (for example, basic auth vs. SigV4). For basic auth, ensure that the correct auth credentials are provided via the command-line argument or Secrets Manager. If the target cluster does not have any form of authorization enabled, but the source cluster does, ensure that the
--remove-auth-header
flag is provided. For more information about authorization header related flags, see Authorization header for Replayer requests. To verify that authorization headers are being applied as expected, check the tuples exported to the EFS volume to see the original and replayed request and responses. To view the tuples, refer to Understanding data from the Replayer
. -
404 errors - 404 errors generally occur when a document that’s being queried or modified can’t be found. If you haven’t done a historic backfill via snapshot & restore, or the backfill is still in progress, it’s possible that a query that returned results on the source cluster will not find the same documents on the target cluster because they haven’t been backfilled yet. The resolution here depends on your use case. If you don’t intend to backfill and your queries rely on recent documents, the issue will reduce in frequency over time. If you have a planned backfill or if it’s already in progress, this will likely clear up when that has completed and all documents are present.
If none of these cases apply and the documents should be present, this might be either due to issues with the backfill or when the document was being replayed against the cluster. This needs to be debugged on a case-to-case basis, but as a good starting point, you can search through the tuples for the document ID in question to check whether it was replayed and successful. Refer to Understanding data from the replayer
for more information.