Application Availability - Netcracker Active Resource Inventory on AWS

Application Availability

For high availability, the following application layer components are relevant:

The relevant database layer component is Amazon Aurora, discussed in the next Database Availability section.

As shown above, Amazon Route 53 simplifies private DNS management by support Fully Qualified Domain Names (FQDNs)for communicating with Netcracker active resource inventory applications. Dynamic notifications for network inventory are routed toward each Availability Zone using the configured DNS Routing Policies to a Network Load Balancer (NLB). The NLB then balances toward the relevant worker node.

Similarly, HTTPS traffic is routed toward each Availability Zone using the configured DNS Routing Policies to an Application Load Balancer (ALB). The ALB then balances toward the relevant worker node.

Elastic Load Balancing and Amazon Route 53

AWS Elastic Load Balancing (ELB) , comprised of an Application Load Balancer (ALB) and Network Load Balancer (NLB), creates a load balancer node in every Availability Zone where Kubernetes worker nodes are deployed. Every load balancing node is registered with Amazon DNS service. If a request is sent via the DNS name of the load balancer, it receives one or all of the IP addresses of the load balancers. Therefore, if one Availability Zone is unavailable or has no healthy nodes, the load balancer can continue to route traffic to the healthy targets in another Availability Zone.

Kubernetes worker nodes

Kubernetes worker nodes are distributed across all Availability Zone by the respective Auto Scaling Group. In case of a failing Availability zone, the Auto Scaling Group detects the failure and balances requests to worker nodes in other Availability Zones.

Kubernetes Deployments

Since Kubernetes automatically spreads the pods in a deployment across nodes and Availability Zone, the impact of an Availability Zone failures is mitigated. Because pod placement is best-effort, pods might not evenly spread, especially if the Availability Zones in the cluster are heterogeneous (that is, different numbers of nodes, different types of nodes, or different pod resource requirements). Using homogeneous Availability Zones (same number and types of nodes) reduces the probability of unequal spreading. For more information, see Running in multiple zones in the Kubernetes best practices documentation.

Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon MSK continuously monitors the health of your clusters and replaces unhealthy brokers without downtime to your applications. Amazon MSK manages the availability of your Apache ZooKeeper nodes so you will not need to start, stop, or directly access the nodes yourself. Amazon MSK uses multi-AZ replication for high-availability.


        Diagram showing Amazon MSK Cluster

Amazon MSK Cluster

The diagram above demonstrates the interaction between the following components:

  • Broker nodes – When creating an Amazon MSK cluster, you specify how many broker nodes you want Amazon MSK to create in each Availability Zone. Each Availability Zone has its own Amazon MSK virtual private cloud (VPC) subnet.

  • ZooKeeper nodes – Amazon MSK creates the Apache ZooKeeper nodes for you. Apache ZooKeeper is an open-source server that enables highly reliable distributed coordination

  • Producer, consumers, and topic creators – Amazon MSK lets you use Apache Kafka data-plane operations to create topics and to produce and consume data

  • AWS CLI – Allows usage of the AWS Command Line Interface (AWS CLI) or the APIs in the SDK to perform control-plane operations.

Amazon MSK detects and automatically recovers from the most common failure scenarios for the cluster so that Netcracker active resource inventory on AWS producer and consumer applications can continue their write and read operations with minimal impact. When Amazon MSK detects a broker failure, it mitigates the failures or replaces the unhealthy or unreachable broker with a new one.