ScalaSparkFlexEtlJobProps
- class aws_cdk.aws_glue_alpha.ScalaSparkFlexEtlJobProps(*, role, script, connections=None, continuous_logging=None, default_arguments=None, description=None, enable_profiling_metrics=None, glue_version=None, job_name=None, max_concurrent_runs=None, max_retries=None, number_of_workers=None, security_configuration=None, tags=None, timeout=None, worker_type=None, spark_ui=None, class_name, extra_files=None, extra_jars=None, extra_jars_first=None, notify_delay_after=None)
Bases:
SparkJobProps
(experimental) Flex Jobs class.
Flex jobs supports Python and Scala language. The flexible execution class is appropriate for non-urgent jobs such as pre-production jobs, testing, and one-time data loads. Flexible job runs are supported for jobs using AWS Glue version 3.0 or later and G.1X or G.2X worker types but will default to the latest version of Glue (currently Glue 3.0.)
Similar to ETL, we’ll enable these features: —enable-metrics, —enable-spark-ui, —enable-continuous-cloudwatch-log
- Parameters:
role (
IRole
) – (experimental) IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn’t have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.script (
Code
) – (experimental) Script Code Location (required) Script to run when the Glue job executes. Can be uploaded from the local directory structure using fromAsset or referenced via S3 location using fromBucketconnections (
Optional
[Sequence
[IConnection
]]) – (experimental) Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC. Default: [] - no connections are added to the jobcontinuous_logging (
Union
[ContinuousLoggingProps
,Dict
[str
,Any
],None
]) – (experimental) Enables continuous logging with the specified props. Default: - continuous logging is enabled.default_arguments (
Optional
[Mapping
[str
,str
]]) – (experimental) Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs. Default: - no argumentsdescription (
Optional
[str
]) – (experimental) Description (optional) Developer-specified description of the Glue job. Default: - no valueenable_profiling_metrics (
Optional
[bool
]) – (experimental) Enables the collection of metrics for job profiling. Default: - no profiling metrics emitted.glue_version (
Optional
[GlueVersion
]) – (experimental) Glue Version The version of Glue to use to execute this job. Default: 3.0 for ETLjob_name (
Optional
[str
]) – (experimental) Name of the Glue job (optional) Developer-specified name of the Glue job. Default: - a name is automatically generatedmax_concurrent_runs (
Union
[int
,float
,None
]) – (experimental) Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit. Default: 1max_retries (
Union
[int
,float
,None
]) – (experimental) Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails. Default: 0number_of_workers (
Union
[int
,float
,None
]) – (experimental) Number of Workers (optional) Number of workers for Glue to use during job execution. Default: 10security_configuration (
Optional
[ISecurityConfiguration
]) – (experimental) Security Configuration (optional) Defines the encryption options for the Glue job. Default: - no security configuration.tags (
Optional
[Mapping
[str
,str
]]) – (experimental) Tags (optional) A list of key:value pairs of tags to apply to this Glue job resources. Default: {} - no tagstimeout (
Optional
[Duration
]) – (experimental) Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. Specified in minutes. Default: 2880 (2 days for non-streaming)worker_type (
Optional
[WorkerType
]) – (experimental) Worker Type (optional) Type of Worker for Glue to use during job execution Enum options: Standard, G_1X, G_2X, G_025X. G_4X, G_8X, Z_2X Default: WorkerType.G_1Xspark_ui (
Union
[SparkUIProps
,Dict
[str
,Any
],None
]) – (experimental) Enables the Spark UI debugging and monitoring with the specified props. Default: - Spark UI debugging and monitoring is disabled.class_name (
str
) – (experimental) The fully qualified Scala class name that serves as the entry point for the job.extra_files (
Optional
[Sequence
[Code
]]) – (experimental) Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. Default: - no extra files specified.extra_jars (
Optional
[Sequence
[Code
]]) – (experimental) Extra Jars S3 URL (optional) S3 URL where additional jar dependencies are located. Default: - no extra jar filesextra_jars_first (
Optional
[bool
]) – (experimental) Setting this value to true prioritizes the customer’s extra JAR files in the classpath. Default: false - priority is not given to user-provided jarsnotify_delay_after (
Optional
[Duration
]) – (experimental) Specifies configuration properties of a notification (optional). After a job run starts, the number of minutes to wait before sending a job run delay notification. Default: - undefined
- Stability:
experimental
- ExampleMetadata:
fixture=_generated
Example:
# The code below shows an example of how to instantiate this type. # The values are placeholders you should change. import aws_cdk.aws_glue_alpha as glue_alpha import aws_cdk as cdk from aws_cdk import aws_iam as iam from aws_cdk import aws_logs as logs from aws_cdk import aws_s3 as s3 # bucket: s3.Bucket # code: glue_alpha.Code # connection: glue_alpha.Connection # log_group: logs.LogGroup # role: iam.Role # security_configuration: glue_alpha.SecurityConfiguration scala_spark_flex_etl_job_props = glue_alpha.ScalaSparkFlexEtlJobProps( class_name="className", role=role, script=code, # the properties below are optional connections=[connection], continuous_logging=glue_alpha.ContinuousLoggingProps( enabled=False, # the properties below are optional conversion_pattern="conversionPattern", log_group=log_group, log_stream_prefix="logStreamPrefix", quiet=False ), default_arguments={ "default_arguments_key": "defaultArguments" }, description="description", enable_profiling_metrics=False, extra_files=[code], extra_jars=[code], extra_jars_first=False, glue_version=glue_alpha.GlueVersion.V0_9, job_name="jobName", max_concurrent_runs=123, max_retries=123, notify_delay_after=cdk.Duration.minutes(30), number_of_workers=123, security_configuration=security_configuration, spark_uI=glue_alpha.SparkUIProps( bucket=bucket, prefix="prefix" ), tags={ "tags_key": "tags" }, timeout=cdk.Duration.minutes(30), worker_type=glue_alpha.WorkerType.STANDARD )
Attributes
- class_name
(experimental) The fully qualified Scala class name that serves as the entry point for the job.
- See:
--class
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html- Stability:
experimental
- connections
(experimental) Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC.
- Default:
[] - no connections are added to the job
- Stability:
experimental
- continuous_logging
(experimental) Enables continuous logging with the specified props.
- Default:
continuous logging is enabled.
- See:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
- Stability:
experimental
- default_arguments
(experimental) Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs.
- Default:
no arguments
- See:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html for a list of reserved parameters :stability: experimental
- description
(experimental) Description (optional) Developer-specified description of the Glue job.
- Default:
no value
- Stability:
experimental
- enable_profiling_metrics
(experimental) Enables the collection of metrics for job profiling.
- Default:
no profiling metrics emitted.
- See:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
- Stability:
experimental
- extra_files
(experimental) Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it.
- Default:
no extra files specified.
- See:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
- Stability:
experimental
- extra_jars
(experimental) Extra Jars S3 URL (optional) S3 URL where additional jar dependencies are located.
- Default:
no extra jar files
- Stability:
experimental
- extra_jars_first
(experimental) Setting this value to true prioritizes the customer’s extra JAR files in the classpath.
- Default:
false - priority is not given to user-provided jars
- See:
--user-jars-first
in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html- Stability:
experimental
- glue_version
(experimental) Glue Version The version of Glue to use to execute this job.
- Default:
3.0 for ETL
- Stability:
experimental
- job_name
(experimental) Name of the Glue job (optional) Developer-specified name of the Glue job.
- Default:
a name is automatically generated
- Stability:
experimental
- max_concurrent_runs
(experimental) Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run.
An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.
- Default:
1
- Stability:
experimental
- max_retries
(experimental) Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails.
- Default:
0
- Stability:
experimental
- notify_delay_after
(experimental) Specifies configuration properties of a notification (optional).
After a job run starts, the number of minutes to wait before sending a job run delay notification.
- Default:
undefined
- Stability:
experimental
- number_of_workers
(experimental) Number of Workers (optional) Number of workers for Glue to use during job execution.
- Default:
10
- Stability:
experimental
- role
(experimental) IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn’t have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.
- See:
https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html
- Stability:
experimental
- script
(experimental) Script Code Location (required) Script to run when the Glue job executes.
Can be uploaded from the local directory structure using fromAsset or referenced via S3 location using fromBucket
- Stability:
experimental
- security_configuration
(experimental) Security Configuration (optional) Defines the encryption options for the Glue job.
- Default:
no security configuration.
- Stability:
experimental
- spark_ui
(experimental) Enables the Spark UI debugging and monitoring with the specified props.
- Default:
Spark UI debugging and monitoring is disabled.
- See:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
- Stability:
experimental
- tags
value pairs of tags to apply to this Glue job resources.
- Default:
{} - no tags
- Stability:
experimental
- Type:
(experimental) Tags (optional) A list of key
- timeout
(experimental) Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status.
Specified in minutes.
- Default:
2880 (2 days for non-streaming)
- Stability:
experimental