Implement AI-powered Kubernetes diagnostics and troubleshooting with K8sGPT and Amazon Bedrock integration
Ishwar Chauthaiwale, Muskan ., and Prafful Gupta, Amazon Web Services
Summary
This pattern demonstrates how to implement AI-powered Kubernetes diagnostics and troubleshooting by integrating K8sGPT with the Anthropic Claude v2 model available on Amazon Bedrock. The solution provides natural language analysis and remediation steps for Kubernetes cluster issues through a secure bastion host architecture. By combining K8sGPT Kubernetes expertise with Amazon Bedrock advanced language capabilities, DevOps teams can quickly identify and resolve cluster problems. With these capabilities, it’s possible to reduce mean time to resolution (MTTR) by up to 50 percent.
This cloud-native pattern leverages Amazon Elastic Kubernetes Service (Amazon EKS) for Kubernetes management. The pattern implements security best practices through proper AWS Identity and Access Management (IAM) roles and network isolation. This solution is particularly valuable for organizations who want to streamline their Kubernetes operations and enhance their troubleshooting capabilities with AI assistance.
Prerequisites and limitations
Prerequisites
An active AWS account with appropriate permissions
AWS Command Line Interface (AWS CLI) installed and configured
An Amazon EKS cluster
Access to Anthropic Claude 2 model on Amazon Bedrock
A bastion host with required security group settings
K8sGPT installed
Limitations
K8sGPT analysis is limited by the context window size of the Claude v2 model.
Amazon Bedrock API rate limits apply based on your account quotas.
Some AWS services aren’t available in all AWS Regions. For Region availability, see AWS Services by Region
. For specific endpoints, see Service endpoints and quotas, and choose the link for the service.
Product versions
Amazon EKS version 1.31 or later
Claude 2 model on Amazon Bedrock
K8sGPT v0.4.2 or later
Architecture
The following diagram shows the architecture for AI-powered Kubernetes diagnostics using K8sGPT integrated with Amazon Bedrock in the AWS Cloud.

The architecture shows the following workflow:
Developers access the environment through a secure connection to the bastion host. This Amazon EC2 instance serves as the secure entry point and contains the K8sGPT command line interface (CLI) installation and required configurations.
The bastion host, configured with specific IAM roles, establishes secure connections to both the Amazon EKS cluster and the Amazon Bedrock endpoints. K8sGPT is installed and configured on the bastion host to perform Kubernetes cluster analysis.
Amazon EKS manages the Kubernetes control plane and worker nodes, providing the target environment for K8sGPT analysis. The service runs across multiple Availability Zones within a virtual private cloud (VPC), which helps to provide high availability and resilience. Amazon EKS supplies operational data through the Kubernetes API, enabling comprehensive cluster analysis.
K8sGPT sends analysis data to Amazon Bedrock, which provides the Claude v2 foundation model (FM) for natural language processing. The service processes K8sGPT analysis to generate human-readable explanations and offers detailed remediation suggestions based on identified issues. Amazon Bedrock operates as a serverless AI service with high availability and scalability.
Note
Throughout this workflow, IAM controls access between components through roles and policies, managing authentication for the bastion host, Amazon EKS, and Amazon Bedrock interactions. IAM implements the principle of least privilege and enables secure cross-service communication throughout the architecture.
Automation and scale
K8sGPT operations can be automated and scaled across multiple Amazon EKS clusters through various AWS services and tools. This solution supports continuous integration and continuous deployment (CI/CD) integration using Jenkins
Tools
AWS services
AWS Command Line Interface (AWS CLI) is an open source tool that helps you interact with AWS services through commands in your command line shell.
Amazon Elastic Kubernetes Service (Amazon EKS) helps you run Kubernetes on AWS without needing to install or maintain your own Kubernetes control plane or nodes.
AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.
Other tools
K8sGPT
is an open source AI-powered tool that transforms Kubernetes management. It acts as a virtual site reliability engineering (SRE) expert, automatically scanning, diagnosing, and troubleshooting Kubernetes cluster issues. Administrators can interact with K8sGPT using natural language and get clear, actionable insights about cluster state, pod crashes, and service failures. The tool's built-in analyzers detect a wide range of issues, from misconfigured components to resource constraints, and provide easy-to-understand explanations and solutions.
Best practices
Implement secure access controls by using AWS Systems Manager Session Manager for bastion host access.
Make sure that K8sGPT authentication uses dedicated IAM roles with least privilege permissions for Amazon Bedrock and Amazon EKS interactions . For more information, see Grant least privilege and Security best practices in the IAM documentation.
Configure resource tagging, enable Amazon CloudWatch logging for audit trails, and implement data anonymization
for sensitive information. Maintain regular backups of K8sGPT configurations while setting up automated scanning schedules during off-peak hours to minimize operational impact.
Epics
Task | Description | Skills required |
---|---|---|
Set Amazon Bedrock as the AI backend provider for K8sGPT. | To set Amazon Bedrock as the AI backend provide
The example command uses To check that
Following is an example of the expected output of this command:
| AWS DevOps |
Task | Description | Skills required |
---|---|---|
View a list of available filters. | To see the list of all available filters, use the following AWS CLI command:
Following is an example of the expected output of this command:
| AWS DevOps |
Scan a pod in a specific namespace by using a filter. | This command is useful for targeted debugging of specific pod issues within a Kubernetes cluster, using Amazon Bedrock AI capabilities to analyze and explain the problems it finds. To scan a pod in a specific namespace by using a filter, use the following AWS CLI command:
Following is an example of the expected output of this command:
| AWS DevOps |
Scan a deployment in a specific namespace by using a filter. | This command is useful for identifying and troubleshooting deployment-specific issues, particularly when the actual state doesn't match the desired state. To scan a deployment in a specific namespace by using a filter, use the following AWS CLI command:
Following is an example of the expected output of this command:
| AWS DevOps |
Scan a node in a specific namespace by using a filter. | To scan a node in a specific namespace by using a filter, use the following AWS CLI command:
Following is an example of the expected output of this command:
| AWS DevOps |
Task | Description | Skills required |
---|---|---|
Get detailed outputs. | To get detailed outputs, use the following AWS CLI command:
Following is an example of the expected output of this command:
| AWS DevOps |
Check problematic pods. | To check for specific problematic pods, use the following AWS CLI command:
Following is an example of the expected output of this command:
| AWS DevOps |
Get application-specific insights. | This command is particularly useful when:
To get application-specific insights, use the following command:
Following is an example of the expected output of this command:
|
Related resources
AWS Blogs
AWS documentation
AWS CLI commands: create-cluster and describe-cluster
Get started with Amazon EKS (Amazon EKS documentation)
Security best practices in IAM (IAM documentation)
Other resources