Create fleet infrastructure with an Amazon EC2 Auto Scaling group - AWS Deadline Cloud

Create fleet infrastructure with an Amazon EC2 Auto Scaling group

This section explains how to create an Amazon EC2 Auto Scaling fleet.

Use the AWS CloudFormation YAML template below to create an Amazon EC2 Auto Scaling (Auto Scaling) group, an Amazon Virtual Private Cloud (Amazon VPC) with two subnets, an instance profile, and an instance access role. These are required to launch instance using Auto Scaling in the subnets.

You should review and update the list of instance types to fit your rendering needs.

To create an Amazon EC2 Auto Scaling fleet

  1. Open the AWS CloudFormation console at https://console.aws.amazon.com/cloudformation.

  2. Create a CloudFormation template with parametersFarm ID, Fleet ID, and AMI ID.

    AWSTemplateFormatVersion: 2010-09-09 Description: Amazon Deadline Cloud customer-managed fleet Parameters: FarmId: Type: String Description: Farm ID FleetId: Type: String Description: Fleet ID AMIId: Type: String Description: AMI ID for launcing Workers Resources: deadlineVPC: Type: 'AWS::EC2::VPC' Properties: CidrBlock: 100.100.0.0/16 deadlineWorkerSecurityGroup: Type: 'AWS::EC2::SecurityGroup' Properties: GroupDescription: !Join - ' ' - - Security Group created for deadline workers in fleet - !Ref FleetId GroupName: !Join - '' - - deadlineWorkerSecurityGroup- - !Ref FleetId SecurityGroupEgress: - CidrIp: 0.0.0.0/0 IpProtocol: '-1' SecurityGroupIngress: [] VpcId: !Ref deadlineVPC deadlineIGW: Type: 'AWS::EC2::InternetGateway' Properties: {} deadlineVPCGatewayAttachment: Type: 'AWS::EC2::VPCGatewayAttachment' Properties: VpcId: !Ref deadlineVPC InternetGatewayId: !Ref deadlineIGW deadlinePublicRouteTable: Type: 'AWS::EC2::RouteTable' Properties: VpcId: !Ref deadlineVPC deadlinePublicRoute: Type: 'AWS::EC2::Route' Properties: RouteTableId: !Ref deadlinePublicRouteTable DestinationCidrBlock: 0.0.0.0/0 GatewayId: !Ref deadlineIGW DependsOn: - deadlineIGW - deadlineVPCGatewayAttachment deadlinePublicSubnet0: Type: 'AWS::EC2::Subnet' Properties: VpcId: !Ref deadlineVPC CidrBlock: 100.100.16.0/22 AvailabilityZone: !Join - '' - - !Ref 'AWS::Region' - a deadlineSubnetRouteTableAssociation0: Type: 'AWS::EC2::SubnetRouteTableAssociation' Properties: RouteTableId: !Ref deadlinePublicRouteTable SubnetId: !Ref deadlinePublicSubnet0 deadlinePublicSubnet1: Type: 'AWS::EC2::Subnet' Properties: VpcId: !Ref deadlineVPC CidrBlock: 100.100.20.0/22 AvailabilityZone: !Join - '' - - !Ref 'AWS::Region' - c deadlineSubnetRouteTableAssociation1: Type: 'AWS::EC2::SubnetRouteTableAssociation' Properties: RouteTableId: !Ref deadlinePublicRouteTable SubnetId: !Ref deadlinePublicSubnet1 deadlineInstanceAccessAccessRole: Type: 'AWS::IAM::Role' Properties: RoleName: !Join - '-' - - deadline - InstanceAccess - !Ref FleetId AssumeRolePolicyDocument: Statement: - Effect: Allow Principal: Service: ec2.amazonaws.com Action: - 'sts:AssumeRole' Path: / ManagedPolicyArns: - 'arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy' - 'arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore' - 'arn:aws:iam::aws:policy/AWSDeadlineCloud-WorkerHost' deadlineInstanceProfile: Type: 'AWS::IAM::InstanceProfile' Properties: Path: / Roles: - !Ref deadlineInstanceAccessAccessRole deadlineLaunchTemplate: Type: 'AWS::EC2::LaunchTemplate' Properties: LaunchTemplateName: !Join - '' - - deadline-LT- - !Ref FleetId LaunchTemplateData: NetworkInterfaces: - DeviceIndex: 0 AssociatePublicIpAddress: true Groups: - !Ref deadlineWorkerSecurityGroup DeleteOnTermination: true ImageId: !Ref AMIId InstanceInitiatedShutdownBehavior: terminate IamInstanceProfile: Arn: !GetAtt - deadlineInstanceProfile - Arn MetadataOptions: HttpTokens: required HttpEndpoint: enabled deadlineAutoScalingGroup: Type: 'AWS::AutoScaling::AutoScalingGroup' Properties: AutoScalingGroupName: !Join - '' - - deadline-ASG-autoscalable- - !Ref FleetId MinSize: 0 MaxSize: 10 VPCZoneIdentifier: - !Ref deadlinePublicSubnet0 - !Ref deadlinePublicSubnet1 NewInstancesProtectedFromScaleIn: true MixedInstancesPolicy: InstancesDistribution: OnDemandBaseCapacity: 0 OnDemandPercentageAboveBaseCapacity: 0 SpotAllocationStrategy: capacity-optimized OnDemandAllocationStrategy: lowest-price LaunchTemplate: LaunchTemplateSpecification: LaunchTemplateId: !Ref deadlineLaunchTemplate Version: !GetAtt - deadlineLaunchTemplate - LatestVersionNumber Overrides: - InstanceType: m5.large - InstanceType: m5d.large - InstanceType: m5a.large - InstanceType: m5ad.large - InstanceType: m5n.large - InstanceType: m5dn.large - InstanceType: m4.large - InstanceType: m3.large - InstanceType: r5.large - InstanceType: r5d.large - InstanceType: r5a.large - InstanceType: r5ad.large - InstanceType: r5n.large - InstanceType: r5dn.large - InstanceType: r4.large MetricsCollection: - Granularity: 1Minute Metrics: - GroupMinSize - GroupMaxSize - GroupDesiredCapacity - GroupInServiceInstances - GroupTotalInstances - GroupInServiceCapacity - GroupTotalCapacity
  3. After you create the IAM roles, you need to acknowledge the following:

    • Credentials from the IAM role that are attached to your worker's Amazon EC2 instance are available to all processes running on that worker, which includes jobs. The worker should have the least privileges to operate: deadline:CreateWorker and deadline:AssumeFleetRoleForWorker.

    • The worker agent obtains credentials for the queue role and configures them for use by running jobs. The Amazon EC2 instance profile role shouldn't include permissions that are needed by your jobs.

Auto scale your Amazon EC2 fleet with Deadline Cloud scale recommendation feature

Deadline Cloud leverages an Amazon EC2 Auto Scaling (Auto Scaling) group to scale the Amazon EC2 customer-managed fleet (CMF) automatically. You need to configure the fleet mode as well as deploy the required infrastructure in your account in order to make your fleet auto scale. The infrastructure you deployed will work for all fleets, so you only need to set it up once.

The basic workflow is: you configure your fleet mode to auto scale, and then Deadline Cloud will send out an EventBridge event for that fleet whenever recommended fleet size changes (one event contains fleet id, recommended fleet size, and other metadata). You will have an EventBridge rule to filter the relevant events and have a Lambda to consume them. The Lambda will integrate with Amazon EC2 Auto Scaling AutoScalingGroup to scale the Amazon EC2 fleet automatically.

Set fleet mode to EVENT_BASED_AUTO_SCALING

Configure your fleet mode to EVENT_BASED_AUTO_SCALING. You can use the console to do this, or use the AWS CLI to directly call the CreateFleet or UpdateFleet API. After the mode is configured, Deadline Cloud starts sending EventBridge events whenever the recommended fleet size changes.

  • Example UpdateFleet command:

    aws deadline update-fleet \ --farm-id FARM_ID \ --fleet-id FLEET_ID \ --configuration file://configuration.json
  • Example CreateFleet command:

    aws deadline create-fleet \ --farm-id FARM_ID \ --display-name "Fleet name" \ --max-worker-count 10 \ --configuration file://configuration.json

The following is an example of configuration.json used in the CLI commands above (--configuration file://configuration.json).

  • To enable Auto Scaling on your fleet, you should set the mode to EVENT_BASED_AUTO_SCALING.

  • The workerCapabilities are the default values assigned to the CMF when you created it. You can change these values if you need to increase resources available to your CMF.

After you configure the fleet mode, Deadline Cloud starts emitting fleet size recommendation events for that fleet.

{ "customerManaged": { "mode": "EVENT_BASED_AUTO_SCALING", "workerCapabilities": { "vCpuCount": { "min": 1, "max": 4 }, "memoryMiB": { "min": 1024, "max": 4096 }, "osFamily": "linux", "cpuArchitectureType": "x86_64", } } }

Deploy Auto Scaling stack using the AWS CloudFormation template

You can set up an EventBridge rule to filter events, a Lambda to consume the events and control Auto Scaling, and an SQS queue to store unprocessed events. Use the following AWS CloudFormation template to deploy everything in a stack. After you deploy the resources successfully, you can submit a job and the fleet will automatically scale up.

Resources: AutoScalingLambda: Type: 'AWS::Lambda::Function' Properties: Code: ZipFile: |- """ This lambda is configured to handle "Fleet Size Recommendation Change" messages. It will handle all such events, and requires that the ASG is named based on the fleet id. It will scale up/down the fleet based on the recommended fleet size in the message. Example EventBridge message: { "version": "0", "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718", "detail-type": "Fleet Size Recommendation Change", "source": "aws.deadline", "account": "111122223333", "time": "2017-12-22T18:43:48Z", "region": "us-west-1", "resources": [], "detail": { "farmId": "farm-12345678900000000000000000000000", "fleetId": "fleet-12345678900000000000000000000000", "oldFleetSize": 1, "newFleetSize": 5, } } """ import json import boto3 import logging logger = logging.getLogger() logger.setLevel(logging.INFO) auto_scaling_client = boto3.client("autoscaling") def lambda_handler(event, context): logger.info(event) event_detail = event["detail"] fleet_id = event_detail["fleetId"] desired_capacity = event_detail["newFleetSize"] asg_name = f"deadline-ASG-autoscalable-{fleet_id}" auto_scaling_client.set_desired_capacity( AutoScalingGroupName=asg_name, DesiredCapacity=desired_capacity, HonorCooldown=False, ) return { 'statusCode': 200, 'body': json.dumps(f'Successfully set desired_capacity for {asg_name} to {desired_capacity}') } Handler: index.lambda_handler Role: !GetAtt - AutoScalingLambdaServiceRole - Arn Runtime: python3.11 DependsOn: - AutoScalingLambdaServiceRoleDefaultPolicy - AutoScalingLambdaServiceRole AutoScalingEventRule: Type: 'AWS::Events::Rule' Properties: EventPattern: source: - aws.deadline detail-type: - Fleet Size Recommendation Change State: ENABLED Targets: - Arn: !GetAtt - AutoScalingLambda - Arn DeadLetterConfig: Arn: !GetAtt - UnprocessedAutoScalingEventQueue - Arn Id: Target0 RetryPolicy: MaximumRetryAttempts: 15 AutoScalingEventRuleTargetPermission: Type: 'AWS::Lambda::Permission' Properties: Action: 'lambda:InvokeFunction' FunctionName: !GetAtt - AutoScalingLambda - Arn Principal: events.amazonaws.com SourceArn: !GetAtt - AutoScalingEventRule - Arn AutoScalingLambdaServiceRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Statement: - Action: 'sts:AssumeRole' Effect: Allow Principal: Service: lambda.amazonaws.com Version: 2012-10-17 ManagedPolicyArns: - !Join - '' - - 'arn:' - !Ref 'AWS::Partition' - ':iam::aws:policy/service-role/AWSLambdaBasicExecutionRole' AutoScalingLambdaServiceRoleDefaultPolicy: Type: 'AWS::IAM::Policy' Properties: PolicyDocument: Statement: - Action: 'autoscaling:SetDesiredCapacity' Effect: Allow Resource: '*' Version: 2012-10-17 PolicyName: AutoScalingLambdaServiceRoleDefaultPolicy Roles: - !Ref AutoScalingLambdaServiceRole UnprocessedAutoScalingEventQueue: Type: 'AWS::SQS::Queue' Properties: QueueName: deadline-unprocessed-autoscaling-events UpdateReplacePolicy: Delete DeletionPolicy: Delete UnprocessedAutoScalingEventQueuePolicy: Type: 'AWS::SQS::QueuePolicy' Properties: PolicyDocument: Statement: - Action: 'sqs:SendMessage' Condition: ArnEquals: 'aws:SourceArn': !GetAtt - AutoScalingEventRule - Arn Effect: Allow Principal: Service: events.amazonaws.com Resource: !GetAtt - UnprocessedAutoScalingEventQueue - Arn Version: 2012-10-17 Queues: - !Ref UnprocessedAutoScalingEventQueue