HealthMonitor

class aws_rfdk.HealthMonitor(scope, id, *, vpc, deletion_protection=None, elb_account_limits=None, encryption_key=None, security_group=None, vpc_subnets=None)

Bases: constructs.Construct

architecture diagram

This construct is responsible for the deep health checks of compute instances.

It also replaces unhealthy instances and suspends unhealthy fleets. Although, using this constructs adds up additional costs for monitoring, it is highly recommended using this construct to help avoid / minimize runaway costs for compute instances.

An instance is considered to be unhealthy when:

  1. Deadline client is not installed on it;

  2. Deadline client is installed but not running on it;

  3. RCS is not configured correctly for Deadline client;

  4. it is unable to connect to RCS due to any infrastructure issues;

  5. the health monitor is unable to reach it because of some infrastructure issues.

A fleet is considered to be unhealthy when:

  1. at least 1 instance is unhealthy for the configured grace period;

  2. a percentage of unhealthy instances in the fleet is above a threshold at any given point of time.

This internally creates an array of application load balancers and attaches the worker-fleet (which internally is implemented as an Auto Scaling Group) to its listeners. There is no load-balancing traffic on the load balancers, it is only used for health checks. Intention is to use the default properties of laod balancer health checks which does HTTP pings at frequent intervals to all the instances in the fleet and determines its health. If any of the instance is found unhealthy, it is replaced. The target group also publishes the unhealthy target count metric which is used to identify the unhealthy fleet.

Other than the default instance level protection, it also creates a lambda which is responsible to set the fleet size to 0 in the event of a fleet being sufficiently unhealthy to warrant termination. This lambda is triggered by CloudWatch alarms via SNS (Simple Notification Service).

architecture diagram

Resources Deployed

  • Application Load Balancer(s) doing frequent pings to the workers.

  • An Amazon Simple Notification Service (SNS) topic for all unhealthy fleet notifications.

  • An AWS Key Management Service (KMS) Key to encrypt SNS messages - If no encryption key is provided.

  • An Amazon CloudWatch Alarm that triggers if a worker fleet is unhealthy for a long period.

  • Another CloudWatch Alarm that triggers if the healthy host percentage of a worker fleet is lower than allowed.

  • A single AWS Lambda function that sets fleet size to 0 when triggered in response to messages on the SNS Topic.

  • Execution logs of the AWS Lambda function are published to a log group in Amazon CloudWatch.

Security Considerations

  • The AWS Lambda that is deployed through this construct will be created from a deployment package that is uploaded to your CDK bootstrap bucket during deployment. You must limit write access to your CDK bootstrap bucket to prevent an attacker from modifying the actions performed by this Lambda. We strongly recommend that you either enable Amazon S3 server access logging on your CDK bootstrap bucket, or enable AWS CloudTrail on your account to assist in post-incident analysis of compromised production environments.

  • The AWS Lambda that is created by this construct to terminate unhealthy worker fleets has permission to UpdateAutoScalingGroup ( https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_UpdateAutoScalingGroup.html ) on all of the fleets that this construct is monitoring. You should not grant any additional actors/principals the ability to modify or execute this Lambda.

  • Execution of the AWS Lambda for terminating unhealthy workers is triggered by messages to the Amazon Simple Notification Service (SNS) Topic that is created by this construct. Any principal that is able to publish notification to this SNS Topic can cause the Lambda to execute and reduce one of your worker fleets to zero instances. You should not grant any additional principals permissions to publish to this SNS Topic.

Parameters
  • scope (Construct) –

  • id (str) –

  • vpc (IVpc) – VPC to launch the Health Monitor in.

  • deletion_protection (Optional[bool]) – Indicates whether deletion protection is enabled for the LoadBalancer. Default: true Note: This value is true by default which means that the deletion protection is enabled for the load balancer. Hence, user needs to disable it using AWS Console or CLI before deleting the stack.

  • elb_account_limits (Optional[Sequence[Union[Limit, Dict[str, Any]]]]) – Describes the current Elastic Load Balancing resource limits for your AWS account. This object should be the output of ‘describeAccountLimits’ API. Default: default account limits for ALB is used

  • encryption_key (Optional[IKey]) – A KMS Key, either managed by this CDK app, or imported. Default: A new Key will be created and used.

  • security_group (Optional[ISecurityGroup]) – Security group for the health monitor. This is security group is associated with the health monitor’s load balancer. Default: : A security group is created

  • vpc_subnets (Union[SubnetSelection, Dict[str, Any], None]) – Any load balancers that get created by calls to registerFleet() will be created in these subnets. Default: : The VPC default strategy

Methods

register_fleet(monitorable_fleet, *, healthy_fleet_threshold_percent=None, instance_healthy_threshold_count=None, instance_unhealthy_threshold_count=None, interval=None, port=None)

Attaches the load-balancing target to the ELB for instance-level monitoring.

The ELB does frequent pings to the workers and determines if a worker node is unhealthy. If so, it replaces the instance.

It also creates an Alarm for healthy host percent and suspends the fleet if the given alarm is breaching. It sets the maxCapacity property of the auto-scaling group to 0. This should be reset manually after fixing the issue.

Parameters
  • monitorable_fleet (IMonitorableFleet) –

  • healthy_fleet_threshold_percent (Union[int, float, None]) – The percent of healthy hosts to consider fleet healthy and functioning. Default: 65%

  • instance_healthy_threshold_count (Union[int, float, None]) – The number of consecutive health checks successes required before considering an unhealthy target healthy. Default: 2

  • instance_unhealthy_threshold_count (Union[int, float, None]) – The number of consecutive health check failures required before considering a target unhealthy. Default: 3

  • interval (Optional[Duration]) – The approximate time between health checks for an individual target. Default: Duration.minutes(5)

  • port (Union[int, float, None]) – The port that the health monitor uses when performing health checks on the targets. Default: 8081

Return type

None

to_string()

Returns a string representation of this construct.

Return type

str

Attributes

DEFAULT_HEALTHY_HOST_THRESHOLD = 2
DEFAULT_HEALTH_CHECK_INTERVAL = <aws_cdk.Duration object>
DEFAULT_HEALTH_CHECK_PORT = 63415
DEFAULT_UNHEALTHY_HOST_THRESHOLD = 3
LOAD_BALANCER_LISTENING_PORT = 8081
node

The tree node.

Return type

Node

unhealthy_fleet_action_topic

SNS topic for all unhealthy fleet notifications.

This is triggered by the grace period and hard terminations alarms for the registered fleets.

This topic can be subscribed to get all fleet termination notifications.

Return type

ITopic

Static Methods

classmethod is_construct(x)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

Parameters

x (Any) – Any object.

Return type

bool

Returns

true if x is an object created from a class which extends Construct.