AWSSupport-TroubleshootEKSALBControllerIssues
Description
The AWSSupport-TroubleshootEKSALBControllerIssues
automation runbook helps diagnose common issues that prevent the AWS Load Balancer Controller from properly provisioning and managing Application Load Balancer (ALB) and Network Load Balancer (NLB) for Kubernetes ingresses and services.
This runbook performs end-to-end validation of essential components including OIDC identity provider setup, IRSA configuration, networking prerequisites, ingress/service configuration, and resource quotas. It also captures controller logs and relevant Kubernetes resource configurations to help identify misconfigurations or operational issues.
Important
This automation runbook is designed for Amazon EKS clusters using Amazon Elastic Compute Cloud (Amazon EC2) node groups and does not currently support clusters running on AWS Fargate.
How does it work?
The runbook AWSSupport-TroubleshootEKSALBControllerIssues
performs the following high-level steps:
-
Validates Amazon EKS cluster status, access entry configuration and OIDC provider setup.
-
Creates temporary Lambda proxy for Kubernetes API communication.
-
Checks AWS Load Balancer Controller deployment and service account configuration.
-
Verifies pod identity webhook and IAM role injection.
-
Validates subnet configuration and tagging for Application Load Balancer and Network Load Balancer provisioning.
-
Checks Application Load Balancer and Network Load Balancer account quotas against current usage.
-
Validates ingress and service resource annotations.
-
Checks worker node security group tagging for load balancer integration.
-
Collects controller pod logs for diagnostics.
-
Cleans up temporary authentication resources.
-
Generates diagnostic report with findings and remediation steps.
Note
-
The Amazon EKS cluster must have an access entry configured for the IAM entity running this automation. The cluster's authentication mode must be set to either
API
orAPI_AND_CONFIG_MAP
. Without proper access entry configuration, the automation will terminate during initial validation. -
The
LambdaRoleArn
parameter is required and must have the AWS managed policiesAWSLambdaBasicExecutionRole
andAWSLambdaVPCAccessExecutionRole
attached to allow the proxy function to communicate with the Kubernetes API. -
The AWS Load Balancer Controller must be version
v2.1.1
or later. -
The automation includes a cleanup step that removes temporary authentication infrastructure resources. This cleanup step runs even when previous steps fail, ensuring no orphaned resources remain in your AWS account.
Document type
Automation
Owner
Amazon
Platforms
/
Required IAM permissions
The AutomationAssumeRole
parameter requires the following actions to
use the runbook successfully.
cloudformation:CreateStack
cloudformation:DeleteStack
cloudformation:DescribeStacks
cloudformation:UpdateStack
ec2:CreateNetworkInterface
ec2:DeleteNetworkInterface
ec2:DescribeInstances
ec2:DescribeNetworkInterfaces
ec2:DescribeRouteTables
ec2:DescribeSecurityGroups
ec2:DescribeSubnets
ec2:DescribeVpcs
eks:DescribeCluster
eks:ListAssociatedAccessPolicies
elasticloadbalancing:DescribeAccountLimits
elasticloadbalancing:DescribeLoadBalancers
iam:GetRole
iam:ListOpenIDConnectProviders
iam:PassRole
lambda:CreateFunction
lambda:DeleteFunction
lambda:GetFunction
lambda:InvokeFunction
lambda:ListTags
lambda:TagResource
lambda:UntagResource
lambda:UpdateFunctionCode
logs:CreateLogGroup
logs:CreateLogStream
logs:DescribeLogGroups
logs:DescribeLogStreams
logs:ListTagsForResource
logs:PutLogEvents
logs:PutRetentionPolicy
logs:TagResource
logs:UntagResource
ssm:DescribeAutomationExecutions
ssm:GetAutomationExecution
ssm:StartAutomationExecution
tag:GetResources
tag:TagResources
Instructions
Follow these steps to configure and run the automation:
Note
Before running the automation, follow these steps to configure the required IAM roles: one for Systems Manager Automation to execute the runbook, and another for Lambda to communicate with the Kubernetes API:
-
Create a SSM automation role
TroubleshootEKSALBController-SSM-Role
in your account. Verify that the trust relationship contains the following policy.{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "ssm.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
-
Attach the following IAM policy to grant the required permissions:
{ "Version": "2012-10-17", "Statement": [{ "Sid": "TroubleshootEKSALBControllerIssuesActions", "Effect": "Allow", "Action": [ "eks:DescribeCluster", "eks:ListAssociatedAccessPolicies", "iam:GetRole", "iam:ListOpenIDConnectProviders", "ssm:StartAutomationExecution", "ssm:GetAutomationExecution", "ssm:DescribeAutomationExecutions", "ec2:DescribeSubnets", "ec2:DescribeRouteTables", "elasticloadbalancing:DescribeLoadBalancers", "elasticloadbalancing:DescribeAccountLimits", "ec2:DescribeInstances", "ec2:DescribeNetworkInterfaces", "ec2:DescribeSecurityGroups" ], "Resource": "*" }, { "Sid": "SetupK8sApiProxyForEKSActions", "Effect": "Allow", "Action": [ "cloudformation:CreateStack", "cloudformation:DeleteStack", "cloudformation:DescribeStacks", "cloudformation:UpdateStack", "ec2:CreateNetworkInterface", "ec2:DeleteNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "eks:DescribeCluster", "iam:GetRole", "lambda:CreateFunction", "lambda:DeleteFunction", "lambda:GetFunction", "lambda:InvokeFunction", "lambda:ListTags", "lambda:TagResource", "lambda:UntagResource", "lambda:UpdateFunctionCode", "logs:CreateLogGroup", "logs:CreateLogStream", "logs:DescribeLogGroups", "logs:DescribeLogStreams", "logs:ListTagsForResource", "logs:PutLogEvents", "logs:PutRetentionPolicy", "logs:TagResource", "logs:UntagResource", "ssm:DescribeAutomationExecutions", "tag:GetResources", "tag:TagResources" ], "Resource": "*" }, { "Sid": "PassRoleToAutomation", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "*", "Condition": { "StringLikeIfExists": { "iam:PassedToService": [ "lambda.amazonaws.com", "ssm.amazonaws.com" ] } } }] }
-
Configure access entry for your Amazon EKS cluster. This is a mandatory requirement for the automation. For steps to configure authentication mode for access entries, see Setting up access entries.
In the Amazon EKS console, navigate to your cluster and follow these steps:
Under Access section, verify your authentication configuration is set to either
API
orAPI_AND_CONFIG_MAP
.-
Choose Create access entry and configure:
For IAM principal ARN, select the IAM role you created (
TroubleshootEKSALBController-SSM-Role
).For Type, select
Standard
.
-
Add an access policy:
For Policy name, select
AmazonEKSAdminViewPolicy
.For Access scope, select
Cluster
.
Choose Add policy.
Verify the details and choose Create.
-
Create an IAM role for the Lambda function (referenced as
LambdaRoleArn
in the input parameters):-
Create a new IAM role with the following trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
-
Attach the following AWS managed policies to this role:
AWSLambdaBasicExecutionRole
AWSLambdaVPCAccessExecutionRole
-
Note the ARN of this role as you will need it for the
LambdaRoleArn
input parameter.
-
-
Navigate to AWSSupport-TroubleshootEKSALBControllerIssues
in the AWS Systems Manager console. -
Choose Execute automation.
-
For the input parameters enter the following:
-
AutomationAssumeRole (Optional):
Type: AWS::IAM::Role::Arn
Description: (Optional) The Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role that allows Systems Manager Automation to perform actions on your behalf. If no role is specified, Systems Manager Automation uses the permissions of the user that starts this runbook.
Allowed Pattern: ^arn:(?:aws|aws-cn|aws-us-gov):iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$
-
EksClusterName (Required):
Type: String
Description: (Required) Name of the Amazon Elastic Kubernetes Service (Amazon EKS) cluster.
Allowed Pattern: ^[0-9A-Za-z][A-Za-z0-9-_]{0,99}$
-
ALBControllerDeploymentName (Optional):
Type: String
Description: (Optional) The name of the AWS Load Balancer Controller deployment in your Amazon EKS cluster. This is typically 'aws-load-balancer-controller' unless you've customized it during installation.
Allowed Pattern: ^[a-z0-9]([-.a-z0-9]{0,251}[a-z0-9])?$
Default: aws-load-balancer-controller
-
ALBControllerNamespace (Optional):
Type: String
Description: (Optional) The Kubernetes namespace where the AWS Load Balancer Controller is deployed. By default, this is 'kube-system', but it may be different if you've installed the controller in a custom namespace.
Allowed Pattern: ^[a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?$
Default: kube-system
-
ServiceAccountName (Optional):
Type: String
Description: (Optional) The name of the Kubernetes Service Account associated with the AWS Load Balancer Controller. This is typically 'aws-load-balancer-controller' unless customized during installation.
Allowed Pattern: ^[a-z0-9]([-.a-z0-9]{0,251}[a-z0-9])?$
Default: aws-load-balancer-controller
-
ServiceAccountNamespace (Optional):
Type: String
Description: (Optional) The Kubernetes namespace where the Service Account for the AWS Load Balancer Controller is located. This is typically 'kube-system', but may differ if you've used a custom namespace.
Allowed Pattern: ^[a-z0-9]([-a-z0-9]{0,61}[a-z0-9])?$
Default: kube-system
-
IngressName (Optional):
Type: String
Description: (Optional) Name of the Ingress resource to validate (Application Load Balancer). If not specified, Ingress validation will be skipped.
Allowed Pattern: ^$|^[a-z0-9][a-z0-9.-]{0,251}[a-z0-9]$
Default: "" (empty string)
-
IngressNamespace (Optional):
Type: String
Description: (Optional) Namespace of the Ingress resource. Required if
IngressName
is specified.Allowed Pattern: ^$|^[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$
Default: "" (empty string)
-
ServiceName (Optional):
Type: String
Description: (Optional) Name of a specific Service resource to validate Network Load Balancer (Network Load Balancer) annotations. If not specified, Service resources validation will be skipped.
Allowed Pattern: ^$|^[a-z0-9][a-z0-9.-]{0,251}[a-z0-9]$
Default: "" (empty string)
-
ServiceNamespace (Optional):
Type: String
Description: (Optional) Namespace of the Service resource. Required if
ServiceName
is specified.Allowed Pattern: ^$|^[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$
Default: "" (empty string)
-
LambdaRoleArn (Required):
Type: AWS::IAM::Role::Arn
Description: (Required) The ARN of the IAM role that allows the AWS Lambda (Lambda) function to access the required AWS services and resources. Associate the AWS managed policies:
AWSLambdaBasicExecutionRole
andAWSLambdaVPCAccessExecutionRole
to your lambda function execution IAM role.Allowed Pattern: ^arn:(?:aws|aws-cn|aws-us-gov):iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$
-
-
Choose Execute.
-
The automation initiates.
-
The document performs the following steps:
-
ValidateAccessEntryAndOIDCProvider:
Validates Amazon EKS cluster IAM setup by checking access entry permissions and OIDC provider configuration.
-
SetupK8sAuthenticationClient:
Execute the SAW Document AWSSupport-SetupK8sApiProxyForEKS to set up a lambda function to run Amazon EKS API calls on the cluster.
-
VerifyALBControllerAndIRSASetup:
Checks whether the given Service Account & Application Load Balancer controller exists in their respective namespaces. Also checks Application Load Balancer controller's Service Account Role Annotation & Trust policy.
-
VerifyPodIdentityWebhookAndEnv:
Checks whether pod-identity-webhook is running. Also checks whether IRSA is injected into pod's ENV variables.
-
ValidateSubnetRequirements:
Check at least two subnets in two AZ's with 8 available IP's, Proper subnet tagging exist for public/private load balancers.
-
CheckLoadBalancerLimitsAndUsage:
Compare the account limit against the number of Application Load Balancer and Network Load Balancer.
-
CheckIngressOrServiceAnnotations:
Checks for correct annotations and specifications in Ingress and Service resources to ensure they are properly configured for Application Load Balancer and Network Load Balancer usage.
-
CheckWorkerNodeSecurityGroupTags:
Verify that exactly one security group attached to the worker nodes has the required cluster tag.
-
CaptureALBControllerLogs:
Retrieves latest diagnostic logs from the AWS Load Balancer Controller pods running in the Amazon EKS cluster.
-
CleanupK8sAuthenticationClient:
Executes the SAW Document 'AWSSupport-SetupK8sApiProxyForEKS' using the 'Cleanup' operation to clean up resources created as part of the automation.
-
GenerateReport:
Generates the automation report.
-
-
After the execution completes, review the Outputs section for the detailed results of the execution:
-
Report:
Provides a comprehensive summary of all checks performed, including the status of the Amazon EKS cluster, Application Load Balancer Controller setup, IRSA configuration, subnet requirements, load balancer limits, ingress/service annotations, worker node security group tags, and Application Load Balancer Controller logs. It also includes any identified issues and recommended remediation steps.
-
References
Systems Manager Automation
Documentation related to AWS Load Balancer Controller