Troubleshooting - AWS Security Hub Automated Response and Remediation

Troubleshooting

Solution logs

AWS Security Hub Automated Response and Remediation (SHARR) collects output from remediation runbooks, which run under AWS Systems Manager, and logs the result to CloudWatch Logs group SO0111-SHARR in the AWS Security Hub Admin account. There is one stream per control per day.

The Orchestrator Step Function logs all step transitions to the SO0111-SHARR-Orchestrator CloudWatch Logs Group in the AWS Security Hub Admin account. This log is an audit trail to record state transitions for each instance of the Step Function. There is one log stream per Step Function execution.

Both log groups are encrypted using an AWS KMS Customer-Manager Key (CMK).

The following troubleshooting information uses the SO0111-SHARR log group. Use this log, as well as AWS Systems Manager Automation console, Automation Executions logs, Step Function console, and Lambda logs to troubleshoot problems.

If a remediation fails, a message similar to the following will be logged to SO0111-SHARR in the log stream for the standard, control, and date. For example: CIS-2.9-2021-08-12

ERROR: a4cbb9bb-24cc-492b-a30f-1123b407a6253: Remediation failed for CIS control 2.9 in account 123412341234: See Automation Execution output for details (AwsEc2Vpc vpc-0e92bbe911cf08acb)

The following messages provide additional detail. This output is from the SHARR runbook for the security standard and control. For example: SHARR-CIS_1.2.0_2.9

Step fails when it is Execution complete: verified. Failed to run automation with executionId: eecdef79-9111-4532-921a-e098549f5259 Failed : {Status=[Failed], Output=[No output available yet because the step is not successfully executed], ExecutionId=[eecdef79-9111-4532-921a-e098549f5259]}. Please refer to Automation Service Troubleshooting Guide for more diagnosis details.

This information points you to the failure, which in this case was a child automation running in the member account. To troubleshoot this issue, you must log in to the AWS Management Console in the member account (from the message above), go to AWS Systems Manager, navigate to Automation, and examine the log output for Execution ID eecdef79-9111-4532-921a-e098549f525.

Issues and resolutions

  • Issue: The solution deployment fails with an error stating that the resources are already available in Amazon CloudWatch.

    Resolution: Check for an error message in the CloudFormation resources/events section indicating log groups already exist. The SHARR deployment templates allow reuse of existing log groups. Verify that you have selected reuse.

  • Issue: I run Security Hub in multiple Regions in the same account. I want to deploy this solution in multiple Regions.

    Resolution: You must deploy the Admin stack in the same account and Region as your Security Hub Admin. Install the Member template into each account and Region where you have a Security Hub Member configured. Enable aggregation in Security Hub.

  • Issue: Immediately after deploying, the SO0111-SHARR-Orchestrator is failing in the Get Automation Document State with a 502 error: “Lambda was unable to decrypt the environment variables because KMS access was denied. Please check the function's KMS key settings. KMS Exception: UnrecognizedClientExceptionKMS Message: The security token included in the request is invalid. (Service: AWSLambda; Status Code: 502; Error Code: KMSAccessDeniedException; Request ID: …”

    Resolution: Allow the solution about 10 minutes to stabilize before running remediations. If the problem continues, open a support ticket or GitHub issue.

  • Issue: I attempted to remediate a finding but nothing happened.

    Resolution: Check the notes of the finding for reasons why it was not remediated. A common cause is that the finding has no automated remediation. At this time there is no way to provide direct feedback to the user when no remediation exists other than via the notes.

    Review the solution logs. Open CloudWatch Logs in the console. Find the SO0111-SHARR CloudWatch Logs Group. Sort the list so the most-recently updated streams appear first. Select the log stream for the finding you attempted to run. You should find any errors there. Some reasons for the failure could be: mismatch between finding control and remediation control, cross-account remediation (not yet supported), or that the finding has already been remediated. If unable to determine the reason for the failure, please collect the logs and open a support ticket.

  • Issue: After starting a remediation, the status in the Security Hub console has not updated.

    Resolution: The Security Hub console does not update automatically. Refresh the current view. The status of the finding should update.

    It might take several hours for the finding to transition from Failed to Passed. Findings are created from event data sent by other services, such as AWS Config, to AWS Security Hub. The time before a rule is reevaluated depends on the underlying service.

    If this does not resolve the issue, refer to the resolution above for “I attempted to remediate a finding but nothing happened.”

  • Issue: Orchestrator step function fails in Get Automation Document State: An error occurred (AccessDenied) when calling the AssumeRole operation.

    Resolution: The member template has not been installed in the member account where SHARR is attempting to remediate a finding. Follow instructions for deployment of the member template.

  • Issue: Config.1 runbook fails because Recorder or Delivery Channel already exists.

    Resolution: Inspect your AWS Config settings carefully to ensure Config is properly set up. The automated remediation is not able to fix existing AWS Config settings in some cases.

  • Issue: Remediation is successful but returns the message "No output available yet because the step is not successfully executed."

    Resolution: This is a known issue in this release where certain remediation runbooks do not return a response. The remediation runbooks will properly fail and signal the solution if they do not work.

  • Issue: The resolution failed and sent a stack trace.

    Resolution: Occasionally, we miss the opportunity to handle an error condition that results in a stack trace rather than an error message. Attempt to troubleshoot the problem from the trace data. Open a support ticket if you need assistance.

  • Issue: Removal of the v1.3.0 stack failed on the Custom Action resource.

    Resolution: Removal of the admin template may fail on the Custom Action removal. This is a known issue that will be fixed in the next release. If this occurs:

    1. Sign in to AWS Security Hub management console.

    2. In the Admin account, go to Settings.

    3. Select the Custom actions tab

    4. Manually delete the entry Remediate with SHARR.

    5. Delete the stack again.

  • Issue: After redeploying the Admin stack the step function is failing on AssumeRole.

    Resolution: Redeploying the Admin stack breaks the trust connection between the Admin role in the Admin account and the Member role in the Member accounts. You must redeploy the Member Roles stack in all member accounts.

  • Issue: CIS 3.x remediations are not showing PASSED after more than 24 hours.

    Resolution: This is a common occurrence if you have no subscriptions to the SO0111-SHARR_LocalAlarmNotification SNS topic in the Member account.