EmrCreateClusterProps

class aws_cdk.aws_stepfunctions_tasks.EmrCreateClusterProps(*, comment=None, heartbeat=None, input_path=None, integration_pattern=None, output_path=None, result_path=None, result_selector=None, timeout=None, additional_info=None, applications=None, auto_scaling_role=None, bootstrap_actions=None, cluster_role=None, configurations=None, custom_ami_id=None, ebs_root_volume_size=None, instances, kerberos_attributes=None, log_uri=None, name, release_label=None, scale_down_behavior=None, security_configuration=None, service_role=None, step_concurrency_level=None, tags=None, visible_to_all_users=None)

Bases: aws_cdk.aws_stepfunctions.TaskStateBaseProps

Properties for EmrCreateCluster.

See the RunJobFlow API for complete documentation on input parameters

Parameters
  • comment (Optional[str]) – An optional description for this state. Default: - No comment

  • heartbeat (Optional[Duration]) – Timeout for the heartbeat. Default: - None

  • input_path (Optional[str]) – JSONPath expression to select part of the state to be the input to this state. May also be the special value JsonPath.DISCARD, which will cause the effective input to be the empty object {}. Default: - The entire task input (JSON path ‘$’)

  • integration_pattern (Optional[IntegrationPattern]) – AWS Step Functions integrates with services directly in the Amazon States Language. You can control these AWS services using service integration patterns Default: IntegrationPattern.REQUEST_RESPONSE

  • output_path (Optional[str]) – JSONPath expression to select select a portion of the state output to pass to the next state. May also be the special value JsonPath.DISCARD, which will cause the effective output to be the empty object {}. Default: - The entire JSON node determined by the state input, the task result, and resultPath is passed to the next state (JSON path ‘$’)

  • result_path (Optional[str]) – JSONPath expression to indicate where to inject the state’s output. May also be the special value JsonPath.DISCARD, which will cause the state’s input to become its output. Default: - Replaces the entire input with the result (JSON path ‘$’)

  • result_selector (Optional[Mapping[str, Any]]) – The JSON that will replace the state’s raw result and become the effective result before ResultPath is applied. You can use ResultSelector to create a payload with values that are static or selected from the state’s raw result. Default: - None

  • timeout (Optional[Duration]) – Timeout for the state machine. Default: - None

  • additional_info (Optional[str]) – A JSON string for selecting additional features. Default: - None

  • applications (Optional[Sequence[ApplicationConfigProperty]]) – A case-insensitive list of applications for Amazon EMR to install and configure when launching the cluster. Default: - EMR selected default

  • auto_scaling_role (Optional[IRole]) – An IAM role for automatic scaling policies. Default: - A role will be created.

  • bootstrap_actions (Optional[Sequence[BootstrapActionConfigProperty]]) – A list of bootstrap actions to run before Hadoop starts on the cluster nodes. Default: - None

  • cluster_role (Optional[IRole]) – Also called instance profile and EC2 role. An IAM role for an EMR cluster. The EC2 instances of the cluster assume this role. This attribute has been renamed from jobFlowRole to clusterRole to align with other ERM/StepFunction integration parameters. Default: - - A Role will be created

  • configurations (Optional[Sequence[ConfigurationProperty]]) – The list of configurations supplied for the EMR cluster you are creating. Default: - None

  • custom_ami_id (Optional[str]) – The ID of a custom Amazon EBS-backed Linux AMI. Default: - None

  • ebs_root_volume_size (Optional[Size]) – The size of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Default: - EMR selected default

  • instances (InstancesConfigProperty) – A specification of the number and type of Amazon EC2 instances.

  • kerberos_attributes (Optional[KerberosAttributesProperty]) – Attributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration. Default: - None

  • log_uri (Optional[str]) – The location in Amazon S3 to write the log files of the job flow. Default: - None

  • name (str) – The Name of the Cluster.

  • release_label (Optional[str]) – The Amazon EMR release label, which determines the version of open-source application packages installed on the cluster. Default: - EMR selected default

  • scale_down_behavior (Optional[EmrClusterScaleDownBehavior]) – Specifies the way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized. Default: - EMR selected default

  • security_configuration (Optional[str]) – The name of a security configuration to apply to the cluster. Default: - None

  • service_role (Optional[IRole]) – The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf. Default: - A role will be created that Amazon EMR service can assume.

  • step_concurrency_level (Union[int, float, None]) – Specifies the step concurrency level to allow multiple steps to run in parallel. Requires EMR release label 5.28.0 or above. Must be in range [1, 256]. Default: 1 - no step concurrency allowed

  • tags (Optional[Mapping[str, str]]) – A list of tags to associate with a cluster and propagate to Amazon EC2 instances. Default: - None

  • visible_to_all_users (Optional[bool]) – A value of true indicates that all IAM users in the AWS account can perform cluster actions if they have the proper IAM policy permissions. Default: true

See

https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html

Example:

cluster_role = iam.Role(self, "ClusterRole",
    assumed_by=iam.ServicePrincipal("ec2.amazonaws.com")
)

service_role = iam.Role(self, "ServiceRole",
    assumed_by=iam.ServicePrincipal("elasticmapreduce.amazonaws.com")
)

auto_scaling_role = iam.Role(self, "AutoScalingRole",
    assumed_by=iam.ServicePrincipal("elasticmapreduce.amazonaws.com")
)

auto_scaling_role.assume_role_policy.add_statements(
    iam.PolicyStatement(
        effect=iam.Effect.ALLOW,
        principals=[
            iam.ServicePrincipal("application-autoscaling.amazonaws.com")
        ],
        actions=["sts:AssumeRole"
        ]
    ))

tasks.EmrCreateCluster(self, "Create Cluster",
    instances=tasks.EmrCreateCluster.InstancesConfigProperty(),
    cluster_role=cluster_role,
    name=sfn.TaskInput.from_json_path_at("$.ClusterName").value,
    service_role=service_role,
    auto_scaling_role=auto_scaling_role
)

Attributes

additional_info

A JSON string for selecting additional features.

Default
  • None

Return type

Optional[str]

applications

A case-insensitive list of applications for Amazon EMR to install and configure when launching the cluster.

Default
  • EMR selected default

Return type

Optional[List[ApplicationConfigProperty]]

auto_scaling_role

An IAM role for automatic scaling policies.

Default
  • A role will be created.

Return type

Optional[IRole]

bootstrap_actions

A list of bootstrap actions to run before Hadoop starts on the cluster nodes.

Default
  • None

Return type

Optional[List[BootstrapActionConfigProperty]]

cluster_role

Also called instance profile and EC2 role.

An IAM role for an EMR cluster. The EC2 instances of the cluster assume this role.

This attribute has been renamed from jobFlowRole to clusterRole to align with other ERM/StepFunction integration parameters.

Default

  • A Role will be created

Return type

Optional[IRole]

comment

An optional description for this state.

Default
  • No comment

Return type

Optional[str]

configurations

The list of configurations supplied for the EMR cluster you are creating.

Default
  • None

Return type

Optional[List[ConfigurationProperty]]

custom_ami_id

The ID of a custom Amazon EBS-backed Linux AMI.

Default
  • None

Return type

Optional[str]

ebs_root_volume_size

The size of the EBS root device volume of the Linux AMI that is used for each EC2 instance.

Default
  • EMR selected default

Return type

Optional[Size]

heartbeat

Timeout for the heartbeat.

Default
  • None

Return type

Optional[Duration]

input_path

JSONPath expression to select part of the state to be the input to this state.

May also be the special value JsonPath.DISCARD, which will cause the effective input to be the empty object {}.

Default
  • The entire task input (JSON path ‘$’)

Return type

Optional[str]

instances

A specification of the number and type of Amazon EC2 instances.

Return type

InstancesConfigProperty

integration_pattern

AWS Step Functions integrates with services directly in the Amazon States Language.

You can control these AWS services using service integration patterns

Default

IntegrationPattern.REQUEST_RESPONSE

See

https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html#connect-wait-token

Return type

Optional[IntegrationPattern]

kerberos_attributes

Attributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration.

Default
  • None

Return type

Optional[KerberosAttributesProperty]

log_uri

The location in Amazon S3 to write the log files of the job flow.

Default
  • None

Return type

Optional[str]

name

The Name of the Cluster.

Return type

str

output_path

JSONPath expression to select select a portion of the state output to pass to the next state.

May also be the special value JsonPath.DISCARD, which will cause the effective output to be the empty object {}.

Default

  • The entire JSON node determined by the state input, the task result,

and resultPath is passed to the next state (JSON path ‘$’)

Return type

Optional[str]

release_label

The Amazon EMR release label, which determines the version of open-source application packages installed on the cluster.

Default
  • EMR selected default

Return type

Optional[str]

result_path

JSONPath expression to indicate where to inject the state’s output.

May also be the special value JsonPath.DISCARD, which will cause the state’s input to become its output.

Default
  • Replaces the entire input with the result (JSON path ‘$’)

Return type

Optional[str]

result_selector

The JSON that will replace the state’s raw result and become the effective result before ResultPath is applied.

You can use ResultSelector to create a payload with values that are static or selected from the state’s raw result.

Default
  • None

See

https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html#input-output-resultselector

Return type

Optional[Mapping[str, Any]]

scale_down_behavior

Specifies the way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized.

Default
  • EMR selected default

Return type

Optional[EmrClusterScaleDownBehavior]

security_configuration

The name of a security configuration to apply to the cluster.

Default
  • None

Return type

Optional[str]

service_role

The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.

Default
  • A role will be created that Amazon EMR service can assume.

Return type

Optional[IRole]

step_concurrency_level

Specifies the step concurrency level to allow multiple steps to run in parallel.

Requires EMR release label 5.28.0 or above. Must be in range [1, 256].

Default

1 - no step concurrency allowed

Return type

Union[int, float, None]

tags

A list of tags to associate with a cluster and propagate to Amazon EC2 instances.

Default
  • None

Return type

Optional[Mapping[str, str]]

timeout

Timeout for the state machine.

Default
  • None

Return type

Optional[Duration]

visible_to_all_users

A value of true indicates that all IAM users in the AWS account can perform cluster actions if they have the proper IAM policy permissions.

Default

true

Return type

Optional[bool]