ScalaSparkEtlJobProps

class aws_cdk.aws_glue_alpha.ScalaSparkEtlJobProps(*, role, script, connections=None, continuous_logging=None, default_arguments=None, description=None, enable_profiling_metrics=None, glue_version=None, job_name=None, max_concurrent_runs=None, max_retries=None, number_of_workers=None, security_configuration=None, tags=None, timeout=None, worker_type=None, enable_metrics=None, enable_observability_metrics=None, spark_ui=None, class_name, extra_files=None, extra_jars=None, extra_jars_first=None, job_run_queuing_enabled=None)

Bases: SparkJobProps

(experimental) Properties for creating a Scala Spark ETL job.

Parameters:

role (IRole) – (experimental) IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn’t have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.
script (Code) – (experimental) Script Code Location (required) Script to run when the Glue job executes. Can be uploaded from the local directory structure using fromAsset or referenced via S3 location using fromBucket
connections (Optional[Sequence[IConnection]]) – (experimental) Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC. Default: [] - no connections are added to the job
continuous_logging (Union[ContinuousLoggingProps, Dict[str, Any], None]) – (experimental) Enables continuous logging with the specified props. Default: - continuous logging is enabled.
default_arguments (Optional[Mapping[str, str]]) – (experimental) Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs. Default: - no arguments
description (Optional[str]) – (experimental) Description (optional) Developer-specified description of the Glue job. Default: - no value
enable_profiling_metrics (Optional[bool]) – (experimental) Enables the collection of metrics for job profiling. Default: - no profiling metrics emitted.
glue_version (Optional[GlueVersion]) – (experimental) Glue Version The version of Glue to use to execute this job. Default: 3.0 for ETL
job_name (Optional[str]) – (experimental) Name of the Glue job (optional) Developer-specified name of the Glue job. Default: - a name is automatically generated
max_concurrent_runs (Union[int, float, None]) – (experimental) Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit. Default: 1
max_retries (Union[int, float, None]) – (experimental) Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails. Default: 0
number_of_workers (Union[int, float, None]) – (experimental) Number of Workers (optional) Number of workers for Glue to use during job execution. Default: 10
security_configuration (Optional[ISecurityConfiguration]) – (experimental) Security Configuration (optional) Defines the encryption options for the Glue job. Default: - no security configuration.
tags (Optional[Mapping[str, str]]) – (experimental) Tags (optional) A list of key:value pairs of tags to apply to this Glue job resources. Default: {} - no tags
timeout (Optional[Duration]) – (experimental) Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. Specified in minutes. Default: 2880 (2 days for non-streaming)
worker_type (Optional[WorkerType]) – (experimental) Worker Type (optional) Type of Worker for Glue to use during job execution Enum options: Standard, G_1X, G_2X, G_025X. G_4X, G_8X, Z_2X Default: WorkerType.G_1X
enable_metrics (Optional[bool]) – (experimental) Enable profiling metrics for the Glue job. When enabled, adds ‘–enable-metrics’ to job arguments. Default: true
enable_observability_metrics (Optional[bool]) – (experimental) Enable observability metrics for the Glue job. When enabled, adds ‘–enable-observability-metrics’: ‘true’ to job arguments. Default: true
spark_ui (Union[SparkUIProps, Dict[str, Any], None]) – (experimental) Enables the Spark UI debugging and monitoring with the specified props. Default: - Spark UI debugging and monitoring is disabled.
class_name (str) – (experimental) Class name (required for Scala scripts) Package and class name for the entry point of Glue job execution for Java scripts.
extra_files (Optional[Sequence[Code]]) – (experimental) Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it. Default: - no extra files specified.
extra_jars (Optional[Sequence[Code]]) – (experimental) Extra Jars S3 URL (optional) S3 URL where additional jar dependencies are located. Default: - no extra jar files
extra_jars_first (Optional[bool]) – (experimental) Setting this value to true prioritizes the customer’s extra JAR files in the classpath. Default: false - priority is not given to user-provided jars
job_run_queuing_enabled (Optional[bool]) – (experimental) Specifies whether job run queuing is enabled for the job runs for this job. A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing. If this field does not match the value set in the job run, then the value from the job run field will be used. This property must be set to false for flex jobs. If this property is enabled, maxRetries must be set to zero. Default: - no job run queuing

Stability:

experimental

ExampleMetadata:

fixture=_generated

Example:

# The code below shows an example of how to instantiate this type.
# The values are placeholders you should change.
import aws_cdk.aws_glue_alpha as glue_alpha
import aws_cdk as cdk
from aws_cdk import aws_iam as iam
from aws_cdk import aws_logs as logs
from aws_cdk import aws_s3 as s3

# bucket: s3.Bucket
# code: glue_alpha.Code
# connection: glue_alpha.Connection
# log_group: logs.LogGroup
# role: iam.Role
# security_configuration: glue_alpha.SecurityConfiguration

scala_spark_etl_job_props = glue_alpha.ScalaSparkEtlJobProps(
    class_name="className",
    role=role,
    script=code,

    # the properties below are optional
    connections=[connection],
    continuous_logging=glue_alpha.ContinuousLoggingProps(
        enabled=False,

        # the properties below are optional
        conversion_pattern="conversionPattern",
        log_group=log_group,
        log_stream_prefix="logStreamPrefix",
        quiet=False
    ),
    default_arguments={
        "default_arguments_key": "defaultArguments"
    },
    description="description",
    enable_metrics=False,
    enable_observability_metrics=False,
    enable_profiling_metrics=False,
    extra_files=[code],
    extra_jars=[code],
    extra_jars_first=False,
    glue_version=glue_alpha.GlueVersion.V0_9,
    job_name="jobName",
    job_run_queuing_enabled=False,
    max_concurrent_runs=123,
    max_retries=123,
    number_of_workers=123,
    security_configuration=security_configuration,
    spark_uI=glue_alpha.SparkUIProps(
        bucket=bucket,
        prefix="prefix"
    ),
    tags={
        "tags_key": "tags"
    },
    timeout=cdk.Duration.minutes(30),
    worker_type=glue_alpha.WorkerType.STANDARD
)

Attributes

class_name

(experimental) Class name (required for Scala scripts) Package and class name for the entry point of Glue job execution for Java scripts.

Stability:: experimental

connections

(experimental) Connections (optional) List of connections to use for this Glue job Connections are used to connect to other AWS Service or resources within a VPC.

Default:: [] - no connections are added to the job
Stability:: experimental

continuous_logging

(experimental) Enables continuous logging with the specified props.

Default:

continuous logging is enabled.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

default_arguments

(experimental) Default Arguments (optional) The default arguments for every run of this Glue job, specified as name-value pairs.

Default:

no arguments

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html for a list of reserved parameters :stability: experimental

description

(experimental) Description (optional) Developer-specified description of the Glue job.

Default:

no value

Stability:

experimental

enable_metrics

(experimental) Enable profiling metrics for the Glue job.

When enabled, adds ‘–enable-metrics’ to job arguments.

Default:: true
Stability:: experimental

enable_observability_metrics

(experimental) Enable observability metrics for the Glue job.

When enabled, adds ‘–enable-observability-metrics’: ‘true’ to job arguments.

Default:: true
Stability:: experimental

enable_profiling_metrics

(experimental) Enables the collection of metrics for job profiling.

Default:

no profiling metrics emitted.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

extra_files

(experimental) Additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it.

Default:

no extra files specified.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

extra_jars

(experimental) Extra Jars S3 URL (optional) S3 URL where additional jar dependencies are located.

Default:

no extra jar files

Stability:

experimental

extra_jars_first

(experimental) Setting this value to true prioritizes the customer’s extra JAR files in the classpath.

Default:: false - priority is not given to user-provided jars
See:: --user-jars-first in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
Stability:: experimental

glue_version

(experimental) Glue Version The version of Glue to use to execute this job.

Default:: 3.0 for ETL
Stability:: experimental

job_name

(experimental) Name of the Glue job (optional) Developer-specified name of the Glue job.

Default:

a name is automatically generated

Stability:

experimental

job_run_queuing_enabled

(experimental) Specifies whether job run queuing is enabled for the job runs for this job.

A value of true means job run queuing is enabled for the job runs. If false or not populated, the job runs will not be considered for queueing. If this field does not match the value set in the job run, then the value from the job run field will be used. This property must be set to false for flex jobs. If this property is enabled, maxRetries must be set to zero.

Default:

no job run queuing

Stability:

experimental

max_concurrent_runs

(experimental) Max Concurrent Runs (optional) The maximum number of runs this Glue job can concurrently run.

An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

Default:: 1
Stability:: experimental

max_retries

(experimental) Max Retries (optional) Maximum number of retry attempts Glue performs if the job fails.

Default:: 0
Stability:: experimental

number_of_workers

(experimental) Number of Workers (optional) Number of workers for Glue to use during job execution.

Default:: 10
Stability:: experimental

role

(experimental) IAM Role (required) IAM Role to use for Glue job execution Must be specified by the developer because the L2 doesn’t have visibility into the actions the script(s) takes during the job execution The role must trust the Glue service principal (glue.amazonaws.com) and be granted sufficient permissions.

See:: https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html
Stability:: experimental

script

(experimental) Script Code Location (required) Script to run when the Glue job executes.

Can be uploaded from the local directory structure using fromAsset or referenced via S3 location using fromBucket

Stability:: experimental

security_configuration

(experimental) Security Configuration (optional) Defines the encryption options for the Glue job.

Default:

no security configuration.

Stability:

experimental

spark_ui

(experimental) Enables the Spark UI debugging and monitoring with the specified props.

Default:

Spark UI debugging and monitoring is disabled.

See:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Stability:

experimental

tags

value pairs of tags to apply to this Glue job resources.

Default:: {} - no tags
Stability:: experimental
Type:: (experimental) Tags (optional) A list of key

timeout

(experimental) Timeout (optional) The maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status.

Specified in minutes.

Default:: 2880 (2 days for non-streaming)
Stability:: experimental

worker_type

Standard, G_1X, G_2X, G_025X.

G_4X, G_8X, Z_2X

Default:: WorkerType.G_1X
Stability:: experimental
Type:: (experimental) Worker Type (optional) Type of Worker for Glue to use during job execution Enum options