PySparkStreamingJob
- class aws_cdk.aws_glue_alpha.PySparkStreamingJob(scope, id, *, extra_files=None, extra_jars=None, extra_python_files=None, job_run_queuing_enabled=None, spark_ui=None, role, script, connections=None, continuous_logging=None, default_arguments=None, description=None, enable_profiling_metrics=None, glue_version=None, job_name=None, max_concurrent_runs=None, max_retries=None, number_of_workers=None, security_configuration=None, tags=None, timeout=None, worker_type=None)
Bases:
Job
(experimental) Python Spark Streaming Jobs class.
A Streaming job is similar to an ETL job, except that it performs ETL on data streams using the Apache Spark Structured Streaming framework. These jobs will default to use Python 3.9.
Similar to ETL jobs, streaming job supports Scala and Python languages. Similar to ETL, it supports G1 and G2 worker type and 2.0, 3.0 and 4.0 version. We’ll default to G2 worker and 4.0 version for streaming jobs which developers can override. We will enable —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log.
- Stability:
experimental
- ExampleMetadata:
infused
Example:
import aws_cdk as cdk import aws_cdk.aws_iam as iam # stack: cdk.Stack # role: iam.IRole # script: glue.Code glue.PySparkStreamingJob(stack, "ImportedJob", role=role, script=script)
(experimental) PySparkStreamingJob constructor.
- Parameters:
scope (
Construct
) –id (
str
) –extra_files (
Optional
[Sequence
[Code
]]) – (experimental) Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. Default: - no extra files specified.extra_jars (
Optional
[Sequence
[Code
]]) – (experimental) Extra Jars S3 URL (optional) S3 URL where additional jar dependencies are located. Default: - no extra jar filesextra_python_files (
Optional
[Sequence
[Code
]]) – (experimental) Extra Python Files S3 URL (optional) S3 URL where additional python dependencies are located. Default: - no extra filesjob_run_queuing_enabled (
Optional
[bool
]) – (experimental) Specifies whether job run queuing is enabled for the job runs for this job. A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing. If this field does not match the value set in the job run, then the value from the job run field will be used. This property must be set to false for flex jobs. If this property is enabled, maxRetries must be set to zero. Default: - no job run queuingspark_ui (
Union
[SparkUIProps
,Dict
[str
,Any
],None
]) – (experimental) Enables the Spark UI debugging and monitoring with the specified props. Default: - Spark UI debugging and monitoring is disabled.role (
IRole
) – (experimental) IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn’t have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.script (
Code
) – (experimental) Script Code Location (required) Script to run when the Glue job executes. Can be uploaded from the local directory structure using fromAsset or referenced via S3 location using fromBucketconnections (
Optional
[Sequence
[IConnection
]]) – (experimental) Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC. Default: [] - no connections are added to the jobcontinuous_logging (
Union
[ContinuousLoggingProps
,Dict
[str
,Any
],None
]) – (experimental) Enables continuous logging with the specified props. Default: - continuous logging is enabled.default_arguments (
Optional
[Mapping
[str
,str
]]) – (experimental) Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs. Default: - no argumentsdescription (
Optional
[str
]) – (experimental) Description (optional) Developer-specified description of the Glue job. Default: - no valueenable_profiling_metrics (
Optional
[bool
]) – (experimental) Enables the collection of metrics for job profiling. Default: - no profiling metrics emitted.glue_version (
Optional
[GlueVersion
]) – (experimental) Glue Version The version of Glue to use to execute this job. Default: 3.0 for ETLjob_name (
Optional
[str
]) – (experimental) Name of the Glue job (optional) Developer-specified name of the Glue job. Default: - a name is automatically generatedmax_concurrent_runs (
Union
[int
,float
,None
]) – (experimental) Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit. Default: 1max_retries (
Union
[int
,float
,None
]) – (experimental) Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails. Default: 0number_of_workers (
Union
[int
,float
,None
]) – (experimental) Number of Workers (optional) Number of workers for Glue to use during job execution. Default: 10security_configuration (
Optional
[ISecurityConfiguration
]) – (experimental) Security Configuration (optional) Defines the encryption options for the Glue job. Default: - no security configuration.tags (
Optional
[Mapping
[str
,str
]]) – (experimental) Tags (optional) A list of key:value pairs of tags to apply to this Glue job resources. Default: {} - no tagstimeout (
Optional
[Duration
]) – (experimental) Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. Specified in minutes. Default: 2880 (2 days for non-streaming)worker_type (
Optional
[WorkerType
]) – (experimental) Worker Type (optional) Type of Worker for Glue to use during job execution Enum options: Standard, G_1X, G_2X, G_025X. G_4X, G_8X, Z_2X Default: WorkerType.G_1X
- Stability:
experimental
Methods
- apply_removal_policy(policy)
Apply the given removal policy to this resource.
The Removal Policy controls what happens to this resource when it stops being managed by CloudFormation, either because you’ve removed it from the CDK application or because you’ve made a change that requires the resource to be replaced.
The resource can be deleted (
RemovalPolicy.DESTROY
), or left in your AWS account for data recovery and cleanup later (RemovalPolicy.RETAIN
).- Parameters:
policy (
RemovalPolicy
) –- Return type:
None
- metric(metric_name, type, *, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, stack_account=None, stack_region=None, statistic=None, unit=None)
(experimental) Create a CloudWatch metric.
- Parameters:
metric_name (
str
) – name of the metric typically prefixed withglue.driver.
,glue.<executorId>.
orglue.ALL.
.type (
MetricType
) – the metric type.account (
Optional
[str
]) – Account which this metric comes from. Default: - Deployment account.color (
Optional
[str
]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. TheColor
class has a set of standard colors that can be used here. Default: - Automatic colordimensions_map (
Optional
[Mapping
[str
,str
]]) – Dimensions of the metric. Default: - No dimensions.label (
Optional
[str
]) – Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No labelperiod (
Optional
[Duration
]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)region (
Optional
[str
]) – Region which this metric comes from. Default: - Deployment region.stack_account (
Optional
[str
]) – Account of the stack this metric is attached to. Default: - Deployment account.stack_region (
Optional
[str
]) – Region of the stack this metric is attached to. Default: - Deployment region.statistic (
Optional
[str
]) – What function to use for aggregating. Use theaws_cloudwatch.Stats
helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Averageunit (
Optional
[Unit
]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream
- See:
https://docs.aws.amazon.com/glue/latest/dg/monitoring-awsglue-with-cloudwatch-metrics.html
- Stability:
experimental
- Return type:
- metric_failure(*, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, stack_account=None, stack_region=None, statistic=None, unit=None)
(experimental) Return a CloudWatch Metric indicating job failure.
This metric is based on the Rule returned by no-args onFailure() call.
- Parameters:
account (
Optional
[str
]) – Account which this metric comes from. Default: - Deployment account.color (
Optional
[str
]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. TheColor
class has a set of standard colors that can be used here. Default: - Automatic colordimensions_map (
Optional
[Mapping
[str
,str
]]) – Dimensions of the metric. Default: - No dimensions.label (
Optional
[str
]) –Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label
period (
Optional
[Duration
]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)region (
Optional
[str
]) – Region which this metric comes from. Default: - Deployment region.stack_account (
Optional
[str
]) – Account of the stack this metric is attached to. Default: - Deployment account.stack_region (
Optional
[str
]) – Region of the stack this metric is attached to. Default: - Deployment region.statistic (
Optional
[str
]) – What function to use for aggregating. Use theaws_cloudwatch.Stats
helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Averageunit (
Optional
[Unit
]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream
- Stability:
experimental
- Return type:
- metric_success(*, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, stack_account=None, stack_region=None, statistic=None, unit=None)
(experimental) Return a CloudWatch Metric indicating job success.
This metric is based on the Rule returned by no-args onSuccess() call.
- Parameters:
account (
Optional
[str
]) – Account which this metric comes from. Default: - Deployment account.color (
Optional
[str
]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. TheColor
class has a set of standard colors that can be used here. Default: - Automatic colordimensions_map (
Optional
[Mapping
[str
,str
]]) – Dimensions of the metric. Default: - No dimensions.label (
Optional
[str
]) –Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label
period (
Optional
[Duration
]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)region (
Optional
[str
]) – Region which this metric comes from. Default: - Deployment region.stack_account (
Optional
[str
]) – Account of the stack this metric is attached to. Default: - Deployment account.stack_region (
Optional
[str
]) – Region of the stack this metric is attached to. Default: - Deployment region.statistic (
Optional
[str
]) – What function to use for aggregating. Use theaws_cloudwatch.Stats
helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Averageunit (
Optional
[Unit
]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream
- Stability:
experimental
- Return type:
- metric_timeout(*, account=None, color=None, dimensions_map=None, label=None, period=None, region=None, stack_account=None, stack_region=None, statistic=None, unit=None)
(experimental) Return a CloudWatch Metric indicating job timeout.
This metric is based on the Rule returned by no-args onTimeout() call.
- Parameters:
account (
Optional
[str
]) – Account which this metric comes from. Default: - Deployment account.color (
Optional
[str
]) – The hex color code, prefixed with ‘#’ (e.g. ‘#00ff00’), to use when this metric is rendered on a graph. TheColor
class has a set of standard colors that can be used here. Default: - Automatic colordimensions_map (
Optional
[Mapping
[str
,str
]]) – Dimensions of the metric. Default: - No dimensions.label (
Optional
[str
]) –Label for this metric when added to a Graph in a Dashboard. You can use dynamic labels to show summary information about the entire displayed time series in the legend. For example, if you use:: [max: ${MAX}] MyMetric As the metric label, the maximum value in the visible range will be shown next to the time series name in the graph’s legend. Default: - No label
period (
Optional
[Duration
]) – The period over which the specified statistic is applied. Default: Duration.minutes(5)region (
Optional
[str
]) – Region which this metric comes from. Default: - Deployment region.stack_account (
Optional
[str
]) – Account of the stack this metric is attached to. Default: - Deployment account.stack_region (
Optional
[str
]) – Region of the stack this metric is attached to. Default: - Deployment region.statistic (
Optional
[str
]) – What function to use for aggregating. Use theaws_cloudwatch.Stats
helper class to construct valid input strings. Can be one of the following: - “Minimum” | “min” - “Maximum” | “max” - “Average” | “avg” - “Sum” | “sum” - “SampleCount | “n” - “pNN.NN” - “tmNN.NN” | “tm(NN.NN%:NN.NN%)” - “iqm” - “wmNN.NN” | “wm(NN.NN%:NN.NN%)” - “tcNN.NN” | “tc(NN.NN%:NN.NN%)” - “tsNN.NN” | “ts(NN.NN%:NN.NN%)” Default: Averageunit (
Optional
[Unit
]) – Unit used to filter the metric stream. Only refer to datums emitted to the metric stream with the given unit and ignore all others. Only useful when datums are being emitted to the same metric stream under different units. The default is to use all matric datums in the stream, regardless of unit, which is recommended in nearly all cases. CloudWatch does not honor this property for graphs. Default: - All metric datums in the given metric stream
- Stability:
experimental
- Return type:
- on_event(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)
(experimental) Create a CloudWatch Event Rule for this Glue Job when it’s in a given state.
- Parameters:
id (
str
) – construct id.target (
Optional
[IRuleTarget
]) – The target to register for the event. Default: - No target is added to the rule. UseaddTarget()
to add a target.cross_stack_scope (
Optional
[Construct
]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)description (
Optional
[str
]) – A description of the rule’s purpose. Default: - No descriptionevent_pattern (
Union
[EventPattern
,Dict
[str
,Any
],None
]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.rule_name (
Optional
[str
]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.
- See:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#glue-event-types
- Stability:
experimental
- Return type:
- on_failure(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)
(experimental) Return a CloudWatch Event Rule matching FAILED state.
- Parameters:
id (
str
) – construct id.target (
Optional
[IRuleTarget
]) – The target to register for the event. Default: - No target is added to the rule. UseaddTarget()
to add a target.cross_stack_scope (
Optional
[Construct
]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)description (
Optional
[str
]) – A description of the rule’s purpose. Default: - No descriptionevent_pattern (
Union
[EventPattern
,Dict
[str
,Any
],None
]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.rule_name (
Optional
[str
]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.
- Stability:
experimental
- Return type:
- on_success(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)
(experimental) Create a CloudWatch Event Rule matching JobState.SUCCEEDED.
- Parameters:
id (
str
) – construct id.target (
Optional
[IRuleTarget
]) – The target to register for the event. Default: - No target is added to the rule. UseaddTarget()
to add a target.cross_stack_scope (
Optional
[Construct
]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)description (
Optional
[str
]) – A description of the rule’s purpose. Default: - No descriptionevent_pattern (
Union
[EventPattern
,Dict
[str
,Any
],None
]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.rule_name (
Optional
[str
]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.
- Stability:
experimental
- Return type:
- on_timeout(id, *, target=None, cross_stack_scope=None, description=None, event_pattern=None, rule_name=None)
(experimental) Return a CloudWatch Event Rule matching TIMEOUT state.
- Parameters:
id (
str
) – construct id.target (
Optional
[IRuleTarget
]) – The target to register for the event. Default: - No target is added to the rule. UseaddTarget()
to add a target.cross_stack_scope (
Optional
[Construct
]) – The scope to use if the source of the rule and its target are in different Stacks (but in the same account & region). This helps dealing with cycles that often arise in these situations. Default: - none (the main scope will be used, even for cross-stack Events)description (
Optional
[str
]) – A description of the rule’s purpose. Default: - No descriptionevent_pattern (
Union
[EventPattern
,Dict
[str
,Any
],None
]) – Additional restrictions for the event to route to the specified target. The method that generates the rule probably imposes some type of event filtering. The filtering implied by what you pass here is added on top of that filtering. Default: - No additional filtering based on an event pattern.rule_name (
Optional
[str
]) – A name for the rule. Default: AWS CloudFormation generates a unique physical ID.
- Stability:
experimental
- Return type:
- setup_continuous_logging(role, *, enabled, conversion_pattern=None, log_group=None, log_stream_prefix=None, quiet=None)
(experimental) Setup Continuous Loggiung Properties.
- Parameters:
role (
IRole
) – The IAM role to use for continuous logging.enabled (
bool
) – (experimental) Enable continouous logging.conversion_pattern (
Optional
[str
]) – (experimental) Apply the provided conversion pattern. This is a Log4j Conversion Pattern to customize driver and executor logs. Default:%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log_group (
Optional
[ILogGroup
]) – (experimental) Specify a custom CloudWatch log group name. Default: - a log group is created with name/aws-glue/jobs/logs-v2/
.log_stream_prefix (
Optional
[str
]) – (experimental) Specify a custom CloudWatch log stream prefix. Default: - the job run ID.quiet (
Optional
[bool
]) – (experimental) Filter out non-useful Apache Spark driver/executor and Apache Hadoop YARN heartbeat log messages. Default: true
- Return type:
Any
- Returns:
String containing the args for the continuous logging command
- Stability:
experimental
- to_string()
Returns a string representation of this construct.
- Return type:
str
Attributes
- env
The environment this resource belongs to.
For resources that are created and managed by the CDK (generally, those created by creating new class instances like Role, Bucket, etc.), this is always the same as the environment of the stack they belong to; however, for imported resources (those obtained from static methods like fromRoleArn, fromBucketName, etc.), that might be different than the stack they were imported into.
- grant_principal
(experimental) The principal to grant permissions to.
- Stability:
experimental
- job_arn
(experimental) The ARN of the job.
- Stability:
experimental
- job_name
(experimental) The name of the job.
- Stability:
experimental
- node
The tree node.
- role
(experimental) The IAM role Glue assumes to run this job.
- Stability:
experimental
- spark_ui_logging_location
(experimental) The Spark UI logs location if Spark UI monitoring and debugging is enabled.
- See:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
- Stability:
experimental
- stack
The stack in which this resource is defined.
Static Methods
- classmethod from_job_attributes(scope, id, *, job_name, role=None)
(experimental) Identifies an existing Glue Job from a subset of attributes that can be referenced from within another Stack or Construct.
- Parameters:
- Stability:
experimental
- Return type:
- classmethod is_construct(x)
Checks if
x
is a construct.Use this method instead of
instanceof
to properly detectConstruct
instances, even when the construct library is symlinked.Explanation: in JavaScript, multiple copies of the
constructs
library on disk are seen as independent, completely different libraries. As a consequence, the classConstruct
in each copy of theconstructs
library is seen as a different class, and an instance of one class will not test asinstanceof
the other class.npm install
will not create installations like this, but users may manually symlink construct libraries together or use a monorepo tool: in those cases, multiple copies of theconstructs
library can be accidentally installed, andinstanceof
will behave unpredictably. It is safest to avoid usinginstanceof
, and using this type-testing method instead.- Parameters:
x (
Any
) – Any object.- Return type:
bool
- Returns:
true if
x
is an object created from a class which extendsConstruct
.
- classmethod is_owned_resource(construct)
Returns true if the construct was created by CDK, and false otherwise.
- Parameters:
construct (
IConstruct
) –- Return type:
bool
- classmethod is_resource(construct)
Check whether the given construct is a Resource.
- Parameters:
construct (
IConstruct
) –- Return type:
bool