InvocationsScalingProps
- class aws_cdk.aws_sagemaker_alpha.InvocationsScalingProps(*, disable_scale_in=None, policy_name=None, scale_in_cooldown=None, scale_out_cooldown=None, max_requests_per_second, safety_factor=None)
Bases:
BaseTargetTrackingProps
(experimental) Properties for enabling SageMaker Endpoint utilization tracking.
- Parameters:
disable_scale_in (
Optional
[bool
]) – Indicates whether scale in by the target tracking policy is disabled. If the value is true, scale in is disabled and the target tracking policy won’t remove capacity from the scalable resource. Otherwise, scale in is enabled and the target tracking policy can remove capacity from the scalable resource. Default: falsepolicy_name (
Optional
[str
]) – A name for the scaling policy. Default: - Automatically generated name.scale_in_cooldown (
Optional
[Duration
]) – Period after a scale in activity completes before another scale in activity can start. Default: Duration.seconds(300) for the following scalable targets: ECS services, Spot Fleet requests, EMR clusters, AppStream 2.0 fleets, Aurora DB clusters, Amazon SageMaker endpoint variants, Custom resources. For all other scalable targets, the default value is Duration.seconds(0): DynamoDB tables, DynamoDB global secondary indexes, Amazon Comprehend document classification endpoints, Lambda provisioned concurrencyscale_out_cooldown (
Optional
[Duration
]) – Period after a scale out activity completes before another scale out activity can start. Default: Duration.seconds(300) for the following scalable targets: ECS services, Spot Fleet requests, EMR clusters, AppStream 2.0 fleets, Aurora DB clusters, Amazon SageMaker endpoint variants, Custom resources. For all other scalable targets, the default value is Duration.seconds(0): DynamoDB tables, DynamoDB global secondary indexes, Amazon Comprehend document classification endpoints, Lambda provisioned concurrencymax_requests_per_second (
Union
[int
,float
]) – (experimental) Max RPS per instance used for calculating the target SageMaker variant invocation per instance. More documentation available here: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.htmlsafety_factor (
Union
[int
,float
,None
]) – (experimental) Safety factor for calculating the target SageMaker variant invocation per instance. More documentation available here: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html Default: 0.5
- Stability:
experimental
- ExampleMetadata:
infused
Example:
import aws_cdk.aws_sagemaker_alpha as sagemaker # model: sagemaker.Model variant_name = "my-variant" endpoint_config = sagemaker.EndpointConfig(self, "EndpointConfig", instance_production_variants=[sagemaker.InstanceProductionVariantProps( model=model, variant_name=variant_name ) ] ) endpoint = sagemaker.Endpoint(self, "Endpoint", endpoint_config=endpoint_config) production_variant = endpoint.find_instance_production_variant(variant_name) instance_count = production_variant.auto_scale_instance_count( max_capacity=3 ) instance_count.scale_on_invocations("LimitRPS", max_requests_per_second=30 )
Attributes
- disable_scale_in
Indicates whether scale in by the target tracking policy is disabled.
If the value is true, scale in is disabled and the target tracking policy won’t remove capacity from the scalable resource. Otherwise, scale in is enabled and the target tracking policy can remove capacity from the scalable resource.
- Default:
false
- max_requests_per_second
(experimental) Max RPS per instance used for calculating the target SageMaker variant invocation per instance.
More documentation available here: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html
- Stability:
experimental
- policy_name
A name for the scaling policy.
- Default:
Automatically generated name.
- safety_factor
(experimental) Safety factor for calculating the target SageMaker variant invocation per instance.
More documentation available here: https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html
- Default:
0.5
- Stability:
experimental
- scale_in_cooldown
Period after a scale in activity completes before another scale in activity can start.
- Default:
Duration.seconds(300) for the following scalable targets: ECS services, Spot Fleet requests, EMR clusters, AppStream 2.0 fleets, Aurora DB clusters, Amazon SageMaker endpoint variants, Custom resources. For all other scalable targets, the default value is Duration.seconds(0): DynamoDB tables, DynamoDB global secondary indexes, Amazon Comprehend document classification endpoints, Lambda provisioned concurrency
- scale_out_cooldown
Period after a scale out activity completes before another scale out activity can start.
- Default:
Duration.seconds(300) for the following scalable targets: ECS services, Spot Fleet requests, EMR clusters, AppStream 2.0 fleets, Aurora DB clusters, Amazon SageMaker endpoint variants, Custom resources. For all other scalable targets, the default value is Duration.seconds(0): DynamoDB tables, DynamoDB global secondary indexes, Amazon Comprehend document classification endpoints, Lambda provisioned concurrency