EmrCreateCluster

class aws_cdk.aws_stepfunctions_tasks.EmrCreateCluster(scope, id, *, instances, name, additional_info=None, applications=None, auto_scaling_role=None, bootstrap_actions=None, cluster_role=None, configurations=None, custom_ami_id=None, ebs_root_volume_size=None, kerberos_attributes=None, log_uri=None, release_label=None, scale_down_behavior=None, security_configuration=None, service_role=None, tags=None, visible_to_all_users=None, comment=None, heartbeat=None, input_path=None, integration_pattern=None, output_path=None, result_path=None, timeout=None)

Bases: aws_cdk.aws_stepfunctions.TaskStateBase

A Step Functions Task to create an EMR Cluster.

The ClusterConfiguration is defined as Parameters in the state machine definition.

OUTPUT: the ClusterId.

stability :stability: experimental

__init__(scope, id, *, instances, name, additional_info=None, applications=None, auto_scaling_role=None, bootstrap_actions=None, cluster_role=None, configurations=None, custom_ami_id=None, ebs_root_volume_size=None, kerberos_attributes=None, log_uri=None, release_label=None, scale_down_behavior=None, security_configuration=None, service_role=None, tags=None, visible_to_all_users=None, comment=None, heartbeat=None, input_path=None, integration_pattern=None, output_path=None, result_path=None, timeout=None)
Parameters
  • scope (Construct) –

  • id (str) –

  • instances (InstancesConfigProperty) – A specification of the number and type of Amazon EC2 instances.

  • name (str) – The Name of the Cluster.

  • additional_info (Optional[str]) – A JSON string for selecting additional features. Default: - None

  • applications (Optional[List[Forwardref]]) – A case-insensitive list of applications for Amazon EMR to install and configure when launching the cluster. Default: - EMR selected default

  • auto_scaling_role (Optional[IRole]) – An IAM role for automatic scaling policies. Default: - A role will be created.

  • bootstrap_actions (Optional[List[Forwardref]]) – A list of bootstrap actions to run before Hadoop starts on the cluster nodes. Default: - None

  • cluster_role (Optional[IRole]) – Also called instance profile and EC2 role. An IAM role for an EMR cluster. The EC2 instances of the cluster assume this role. This attribute has been renamed from jobFlowRole to clusterRole to align with other ERM/StepFunction integration parameters. Default: - - A Role will be created

  • configurations (Optional[List[Forwardref]]) – The list of configurations supplied for the EMR cluster you are creating. Default: - None

  • custom_ami_id (Optional[str]) – The ID of a custom Amazon EBS-backed Linux AMI. Default: - None

  • ebs_root_volume_size (Optional[Size]) – The size of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Default: - EMR selected default

  • kerberos_attributes (Optional[Forwardref]) – Attributes for Kerberos configuration when Kerberos authentication is enabled using a security configuration. Default: - None

  • log_uri (Optional[str]) – The location in Amazon S3 to write the log files of the job flow. Default: - None

  • release_label (Optional[str]) – The Amazon EMR release label, which determines the version of open-source application packages installed on the cluster. Default: - EMR selected default

  • scale_down_behavior (Optional[Forwardref]) – Specifies the way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resized. Default: - EMR selected default

  • security_configuration (Optional[str]) – The name of a security configuration to apply to the cluster. Default: - None

  • service_role (Optional[IRole]) – The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf. Default: - A role will be created that Amazon EMR service can assume.

  • tags (Optional[Mapping[str, str]]) – A list of tags to associate with a cluster and propagate to Amazon EC2 instances. Default: - None

  • visible_to_all_users (Optional[bool]) – A value of true indicates that all IAM users in the AWS account can perform cluster actions if they have the proper IAM policy permissions. Default: true

  • comment (Optional[str]) – An optional description for this state. Default: - No comment

  • heartbeat (Optional[Duration]) – Timeout for the heartbeat. Default: - None

  • input_path (Optional[str]) – JSONPath expression to select part of the state to be the input to this state. May also be the special value JsonPath.DISCARD, which will cause the effective input to be the empty object {}. Default: - The entire task input (JSON path ‘$’)

  • integration_pattern (Optional[IntegrationPattern]) – AWS Step Functions integrates with services directly in the Amazon States Language. You can control these AWS services using service integration patterns Default: IntegrationPattern.REQUEST_RESPONSE

  • output_path (Optional[str]) – JSONPath expression to select select a portion of the state output to pass to the next state. May also be the special value JsonPath.DISCARD, which will cause the effective output to be the empty object {}. Default: - The entire JSON node determined by the state input, the task result, and resultPath is passed to the next state (JSON path ‘$’)

  • result_path (Optional[str]) – JSONPath expression to indicate where to inject the state’s output. May also be the special value JsonPath.DISCARD, which will cause the state’s input to become its output. Default: - Replaces the entire input with the result (JSON path ‘$’)

  • timeout (Optional[Duration]) – Timeout for the state machine. Default: - None

stability :stability: experimental

Return type

None

Methods

add_catch(handler, *, errors=None, result_path=None)

Add a recovery handler for this state.

When a particular error occurs, execution will continue at the error handler instead of failing the state machine execution.

Parameters
  • handler (IChainable) –

  • errors (Optional[List[str]]) – Errors to recover from by going to the given state. A list of error strings to retry, which can be either predefined errors (for example Errors.NoChoiceMatched) or a self-defined error. Default: All errors

  • result_path (Optional[str]) – JSONPath expression to indicate where to inject the error data. May also be the special value DISCARD, which will cause the error data to be discarded. Default: $

Return type

TaskStateBase

add_prefix(x)

Add a prefix to the stateId of this state.

Parameters

x (str) –

Return type

None

add_retry(*, backoff_rate=None, errors=None, interval=None, max_attempts=None)

Add retry configuration for this state.

This controls if and how the execution will be retried if a particular error occurs.

Parameters
  • backoff_rate (Union[int, float, None]) – Multiplication for how much longer the wait interval gets on every retry. Default: 2

  • errors (Optional[List[str]]) – Errors to retry. A list of error strings to retry, which can be either predefined errors (for example Errors.NoChoiceMatched) or a self-defined error. Default: All errors

  • interval (Optional[Duration]) – How many seconds to wait initially before retrying. Default: Duration.seconds(1)

  • max_attempts (Union[int, float, None]) – How many times to retry this particular error. May be 0 to disable retry for specific errors (in case you have a catch-all retry policy). Default: 3

Return type

TaskStateBase

bind_to_graph(graph)

Register this state as part of the given graph.

Don’t call this. It will be called automatically when you work with states normally.

Parameters

graph (StateGraph) –

Return type

None

metric(metric_name, *, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Return the given named metric for this Task.

Parameters
  • metric_name (str) –

  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

metric_failed(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Metric for the number of times this activity fails.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

metric_heartbeat_timed_out(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Metric for the number of times the heartbeat times out for this activity.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

metric_run_time(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

The interval, in milliseconds, between the time the Task starts and the time it closes.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - average over 5 minutes

Return type

Metric

metric_schedule_time(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

The interval, in milliseconds, for which the activity stays in the schedule state.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - average over 5 minutes

Return type

Metric

metric_scheduled(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Metric for the number of times this activity is scheduled.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

metric_started(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Metric for the number of times this activity is started.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

metric_succeeded(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Metric for the number of times this activity succeeds.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

metric_time(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

The interval, in milliseconds, between the time the activity is scheduled and the time it closes.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - average over 5 minutes

Return type

Metric

metric_timed_out(*, account=None, color=None, dimensions=None, label=None, period=None, region=None, statistic=None, unit=None)

Metric for the number of times this activity times out.

Parameters
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions (Optional[Mapping[str, Any]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

default :default: - sum over 5 minutes

Return type

Metric

next(next)

Continue normal execution with the given state.

Parameters

next (IChainable) –

Return type

Chain

to_state_json()

Return the Amazon States Language object for this state.

Return type

Mapping[Any, Any]

to_string()

Returns a string representation of this construct.

Return type

str

Attributes

auto_scaling_role

The autoscaling role for the EMR Cluster.

Only available after task has been added to a state machine.

stability :stability: experimental

Return type

IRole

cluster_role

The instance role for the EMR Cluster.

Only available after task has been added to a state machine.

stability :stability: experimental

Return type

IRole

end_states

Continuable states of this Chainable.

Return type

List[INextable]

id

Descriptive identifier for this chainable.

Return type

str

node

The construct tree node associated with this construct.

Return type

ConstructNode

service_role

The service role for the EMR Cluster.

Only available after task has been added to a state machine.

stability :stability: experimental

Return type

IRole

start_state

First state of this Chainable.

Return type

State

state_id

Tokenized string that evaluates to the state’s ID.

Return type

str

Static Methods

classmethod filter_nextables(states)

Return only the states that allow chaining from an array of states.

Parameters

states (List[State]) –

Return type

List[INextable]

classmethod find_reachable_end_states(start, *, include_error_handlers=None)

Find the set of end states states reachable through transitions from the given start state.

Parameters
  • start (State) –

  • include_error_handlers (Optional[bool]) – Whether or not to follow error-handling transitions. Default: false

Return type

List[State]

classmethod find_reachable_states(start, *, include_error_handlers=None)

Find the set of states reachable through transitions from the given start state.

This does not retrieve states from within sub-graphs, such as states within a Parallel state’s branch.

Parameters
  • start (State) –

  • include_error_handlers (Optional[bool]) – Whether or not to follow error-handling transitions. Default: false

Return type

List[State]

classmethod is_construct(x)

Return whether the given object is a Construct.

Parameters

x (Any) –

Return type

bool

classmethod prefix_states(root, prefix)

Add a prefix to the stateId of all States found in a construct tree.

Parameters
Return type

None