Job

class aws_cdk.aws_glue_alpha.Job(scope, id, *, executable, connections=None, continuous_logging=None, default_arguments=None, description=None, enable_profiling_metrics=None, execution_class=None, job_name=None, max_capacity=None, max_concurrent_runs=None, max_retries=None, notify_delay_after=None, role=None, security_configuration=None, spark_ui=None, tags=None, timeout=None, worker_count=None, worker_type=None)

Bases: Resource

(experimental) A Glue Job.

Stability:

experimental

ExampleMetadata:

infused

Example:

glue.Job(self, "EnableSparkUI",
    job_name="EtlJobWithSparkUIPrefix",
    spark_uI=glue.SparkUIProps(
        enabled=True
    ),
    executable=glue.JobExecutable.python_etl(
        glue_version=glue.GlueVersion.V3_0,
        python_version=glue.PythonVersion.THREE,
        script=glue.Code.from_asset(path.join(__dirname, "job-script", "hello_world.py"))
    )
)
Parameters:
  • scope (Construct) –

  • id (str) –

  • executable (JobExecutable) – (experimental) The job’s executable properties.

  • connections (Optional[Sequence[IConnection]]) – (experimental) The ``Connection``s used for this job. Connections are used to connect to other AWS Service or resources within a VPC. Default: [] - no connections are added to the job

  • continuous_logging (Union[ContinuousLoggingProps, Dict[str, Any], None]) – (experimental) Enables continuous logging with the specified props. Default: - continuous logging is disabled.

  • default_arguments (Optional[Mapping[str, str]]) – (experimental) The default arguments for this job, specified as name-value pairs. Default: - no arguments

  • description (Optional[str]) – (experimental) The description of the job. Default: - no value

  • enable_profiling_metrics (Optional[bool]) – (experimental) Enables the collection of metrics for job profiling. Equivalent to a job parameter --enable-metrics. Default: - no profiling metrics emitted.

  • execution_class (Optional[ExecutionClass]) – (experimental) The ExecutionClass whether the job is run with a standard or flexible execution class. Default: - STANDARD

  • job_name (Optional[str]) – (experimental) The name of the job. Default: - a name is automatically generated

  • max_capacity (Union[int, float, None]) – (experimental) The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. Cannot be used for Glue version 2.0 and later - workerType and workerCount should be used instead. Default: - 10 when job type is Apache Spark ETL or streaming, 0.0625 when job type is Python shell

  • max_concurrent_runs (Union[int, float, None]) – (experimental) The maximum number of concurrent runs allowed for the job. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit. Default: 1

  • max_retries (Union[int, float, None]) – (experimental) The maximum number of times to retry this job after a job run fails. Default: 0

  • notify_delay_after (Optional[Duration]) – (experimental) The number of minutes to wait after a job run starts, before sending a job run delay notification. Default: - no delay notifications

  • role (Optional[IRole]) – (experimental) The IAM role assumed by Glue to run this job. If providing a custom role, it needs to trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions. Default: - a role is automatically generated

  • security_configuration (Optional[ISecurityConfiguration]) – (experimental) The SecurityConfiguration to use for this job. Default: - no security configuration.

  • spark_ui (Union[SparkUIProps, Dict[str, Any], None]) – (experimental) Enables the Spark UI debugging and monitoring with the specified props. Default: - Spark UI debugging and monitoring is disabled.

  • tags (Optional[Mapping[str, str]]) – (experimental) The tags to add to the resources on which the job runs. Default: {} - no tags

  • timeout (Optional[Duration]) – (experimental) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. Default: cdk.Duration.hours(48)

  • worker_count (Union[int, float, None]) – (experimental) The number of workers of a defined WorkerType that are allocated when a job runs. Default: - differs based on specific Glue version/worker type

  • worker_type (Optional[WorkerType]) – (experimental) The type of predefined worker that is allocated when a job runs. Default: - differs based on specific Glue version

Stability:

experimental

Methods

apply_removal_policy(policy)

Apply the given removal policy to this resource.

The Removal Policy controls what happens to this resource when it stops being managed by CloudFormation, either because you’ve removed it from the CDK application or because you’ve made a change that requires the resource to be replaced.

The resource can be deleted (RemovalPolicy.DESTROY), or left in your AWS account for data recovery and cleanup later (RemovalPolicy.RETAIN).

Parameters:

policy (RemovalPolicy) –

Return type:

None

metric(metric_name, type, *, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, statistic=None, unit=None)

(experimental) Create a CloudWatch metric.

Parameters:
  • metric_name (str) – name of the metric typically prefixed with glue.driver., glue.<executorId>. or glue.ALL..

  • type (MetricType) – the metric type.

  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions_map (Optional[Mapping[str, str]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) – Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Use the aws_cloudwatch.Stats helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

See:

https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html

Stability:

experimental

Return type:

Metric

metric_failure(*, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, statistic=None, unit=None)

(experimental) Return a CloudWatch Metric indicating job failure.

This metric is based on the Rule returned by no-args onFailure() call.

Parameters:
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions_map (Optional[Mapping[str, str]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) –

    Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Use the aws_cloudwatch.Stats helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

Stability:

experimental

Return type:

Metric

metric_success(*, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, statistic=None, unit=None)

(experimental) Return a CloudWatch Metric indicating job success.

This metric is based on the Rule returned by no-args onSuccess() call.

Parameters:
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions_map (Optional[Mapping[str, str]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) –

    Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Use the aws_cloudwatch.Stats helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

Stability:

experimental

Return type:

Metric

metric_timeout(*, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, statistic=None, unit=None)

(experimental) Return a CloudWatch Metric indicating job timeout.

This metric is based on the Rule returned by no-args onTimeout() call.

Parameters:
  • account (Optional[str]) – Account which this metric comes from. Default: - Deployment account.

  • color (Optional[str]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. The Color class has a set of standard colors that can be used here. Default: - Automatic color

  • dimensions_map (Optional[Mapping[str, str]]) – Dimensions of the metric. Default: - No dimensions.

  • label (Optional[str]) –

    Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label

  • period (Optional[Duration]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)

  • region (Optional[str]) – Region which this metric comes from. Default: - Deployment region.

  • statistic (Optional[str]) – What function to use for aggregating. Use the aws_cloudwatch.Stats helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Average

  • unit (Optional[Unit]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream

Stability:

experimental

Return type:

Metric

on_event(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)

(experimental) Create a CloudWatch Event Rule for this Glue Job when it’s in a given state.

Parameters:
  • id (str) – construct id.

  • target (Optional[IRuleTarget]) – The target to register for the event. Default: - No target is added to the rule. Use addTarget() to add a target.

  • cross_stack_scope (Optional[Construct]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)

  • description (Optional[str]) – A description of the rule’s purpose. Default: - No description

  • event_pattern (Union[EventPattern, Dict[str, Any], None]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.

  • rule_name (Optional[str]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.

See:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types

Stability:

experimental

Return type:

Rule

on_failure(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)

(experimental) Return a CloudWatch Event Rule matching FAILED state.

Parameters:
  • id (str) – construct id.

  • target (Optional[IRuleTarget]) – The target to register for the event. Default: - No target is added to the rule. Use addTarget() to add a target.

  • cross_stack_scope (Optional[Construct]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)

  • description (Optional[str]) – A description of the rule’s purpose. Default: - No description

  • event_pattern (Union[EventPattern, Dict[str, Any], None]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.

  • rule_name (Optional[str]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.

Stability:

experimental

Return type:

Rule

on_state_change(id, job_state, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)

(experimental) Create a CloudWatch Event Rule for the transition into the input jobState.

Parameters:
  • id (str) – construct id.

  • job_state (JobState) – the job state.

  • target (Optional[IRuleTarget]) – The target to register for the event. Default: - No target is added to the rule. Use addTarget() to add a target.

  • cross_stack_scope (Optional[Construct]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)

  • description (Optional[str]) – A description of the rule’s purpose. Default: - No description

  • event_pattern (Union[EventPattern, Dict[str, Any], None]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.

  • rule_name (Optional[str]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.

Stability:

experimental

Return type:

Rule

on_success(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)

(experimental) Create a CloudWatch Event Rule matching JobState.SUCCEEDED.

Parameters:
  • id (str) – construct id.

  • target (Optional[IRuleTarget]) – The target to register for the event. Default: - No target is added to the rule. Use addTarget() to add a target.

  • cross_stack_scope (Optional[Construct]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)

  • description (Optional[str]) – A description of the rule’s purpose. Default: - No description

  • event_pattern (Union[EventPattern, Dict[str, Any], None]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.

  • rule_name (Optional[str]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.

Stability:

experimental

Return type:

Rule

on_timeout(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)

(experimental) Return a CloudWatch Event Rule matching TIMEOUT state.

Parameters:
  • id (str) – construct id.

  • target (Optional[IRuleTarget]) – The target to register for the event. Default: - No target is added to the rule. Use addTarget() to add a target.

  • cross_stack_scope (Optional[Construct]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)

  • description (Optional[str]) – A description of the rule’s purpose. Default: - No description

  • event_pattern (Union[EventPattern, Dict[str, Any], None]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.

  • rule_name (Optional[str]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.

Stability:

experimental

Return type:

Rule

to_string()

Returns a string representation of this construct.

Return type:

str

Attributes

env

The environment this resource belongs to.

For resources that are created and managed by the CDK (generally, those created by creating new class instances like Role, Bucket, etc.), this is always the same as the environment of the stack they belong to; however, for imported resources (those obtained from static methods like fromRoleArn, fromBucketName, etc.), that might be different than the stack they were imported into.

grant_principal

(experimental) The principal this Glue Job is running as.

Stability:

experimental

job_arn

(experimental) The ARN of the job.

Stability:

experimental

job_name

(experimental) The name of the job.

Stability:

experimental

node

The tree node.

role

(experimental) The IAM role Glue assumes to run this job.

Stability:

experimental

spark_ui_logging_location

(experimental) The Spark UI logs location if Spark UI monitoring and debugging is enabled.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

stack

The stack in which this resource is defined.

Static Methods

classmethod from_job_attributes(scope, id, *, job_name, role=None)

(experimental) Creates a Glue Job.

Parameters:
  • scope (Construct) – The scope creating construct (usually this).

  • id (str) – The construct’s id.

  • job_name (str) – (experimental) The name of the job.

  • role (Optional[IRole]) – (experimental) The IAM role assumed by Glue to run this job. Default: - undefined

Stability:

experimental

Return type:

IJob

classmethod is_construct(x)

Checks if x is a construct.

Use this method instead of instanceof to properly detect Construct instances, even when the construct library is symlinked.

Explanation: in JavaScript, multiple copies of the constructs library on disk are seen as independent, completely different libraries. As a consequence, the class Construct in each copy of the constructs library is seen as a different class, and an instance of one class will not test as instanceof the other class. npm install will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of the constructs library can be accidentally installed, and instanceof will behave unpredictably. It is safest to avoid using instanceof, and using this type-testing method instead.

Parameters:

x (Any) – Any object.

Return type:

bool

Returns:

true if x is an object created from a class which extends Construct.

classmethod is_owned_resource(construct)

Returns true if the construct was created by CDK, and false otherwise.

Parameters:

construct (IConstruct) –

Return type:

bool

classmethod is_resource(construct)

Check whether the given construct is a Resource.

Parameters:

construct (IConstruct) –

Return type:

bool