Menu
AWS Glue
Developer Guide

Jobs

Data Types

Job Structure

Specifies a job.

Fields

  • Name – String, matching the Single-line string pattern.

    The name you assign to this job.

  • Description – Description string, matching the URI address multi-line string pattern.

    Description of this job.

  • LogUri – String.

    This field is reserved for future use.

  • Role – String.

    The name of the IAM role associated with this job.

  • CreatedOn – Timestamp.

    The time and date that this job specification was created.

  • LastModifiedOn – Timestamp.

    The last point in time when this job specification was modified.

  • ExecutionProperty – An ExecutionProperty object.

    An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

  • Command – A JobCommand object.

    The JobCommand that executes this job.

  • DefaultArguments – An array of UTF-8 string–to–UTF-8 string mappings.

    The default arguments for this job, specified as name-value pairs.

    You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

    For information about how to specify and consume your own Job arguments, see the Developer Guide Python programming topic.

    AWS Glue consumes the following arguments to set up the Job script environment:

    • --scriptLocation  —  The S3 location where your ETL script is located (in a form like s3://path/to/my/script.py). This overrides a script location set in the JobCommand object.

    • --extra-py-files  —  S3 path(s) to additional Python modules that AWS Glue will add to the Python path before executing your script. Multiple values must be complete paths separated by a comma (,). Note that only pure Python modules will work currently. Extension modules written in C or other languages are not supported.

    • --extra-jars  —  S3 path(s) to additional Java .jar file(s) that AWS Glue will add to the Java classpath before executing your script. Multiple values must be complete paths separated by a comma (,).

    • --extra-files  —  S3 path(s) to additional files such as configuration files that AWS Glue will copy to the working directory of your script before executing it. Multiple values must be complete paths separated by a comma (,).

    • --job-bookmark-option  —  When this argument is present, bookmarking is enabled, so that a JobRun starts from where the last one left off.

    • --TempDir  —  Specifies an S3 path to a bucket that can be used as a temporary directory for the Job.

    There are several argument names used by AWS Glue internally that you should never set:

    • --conf  —  Internal to AWS Glue. Do not set!

    • --debug  —  Internal to AWS Glue. Do not set!

    • --mode  —  Internal to AWS Glue. Do not set!

    • --JOB_NAME  —  Internal to AWS Glue. Do not set!

  • Connections – A ConnectionsList object.

    The connections used for this job.

  • MaxRetries – Number (integer).

    The maximum number of times to retry this job if it fails.

  • AllocatedCapacity – Number (integer).

    The number of AWS Glue data processing units (DPUs) allocated to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

ExecutionProperty Structure

An execution property of a job.

Fields

  • MaxConcurrentRuns – Number (integer).

    The maximum number of concurrent runs allowed for a job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

JobCommand Structure

Specifies code that executes a job.

Fields

  • Name – String.

    The name of the job command: this must be glueetl.

  • ScriptLocation – String.

    Specifies the S3 path to a script that executes a job (required).

ConnectionsList Structure

Specifies the connections used by a job.

Fields

  • Connections – An array of UTF-8 strings.

    A list of connections used by the job.

JobUpdate Structure

Specifies information used to update an existing job. Note that the previous job definition will be completely overwritten by this information.

Fields

  • Description – Description string, matching the URI address multi-line string pattern.

    Description of the job.

  • LogUri – String.

    This field is reserved for future use.

  • Role – String.

    The name of the IAM role associated with this job (required).

  • ExecutionProperty – An ExecutionProperty object.

    An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

  • Command – A JobCommand object.

    The JobCommand that executes this job (required).

  • DefaultArguments – An array of UTF-8 string–to–UTF-8 string mappings.

    The default arguments for this job.

    You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

    For information about how to specify and consume your own Job arguments, see the Developer Guide Python programming topic.

    AWS Glue consumes the following arguments to set up the Job script environment:

    • --scriptLocation  —  The S3 location where your ETL script is located (in a form like s3://path/to/my/script.py). This overrides a script location set in the JobCommand object.

    • --extra-py-files  —  S3 path(s) to additional Python modules that AWS Glue will add to the Python path before executing your script. Multiple values must be complete paths separated by a comma (,). Note that only pure Python modules will work currently. Extension modules written in C or other languages are not supported.

    • --extra-jars  —  S3 path(s) to additional Java .jar file(s) that AWS Glue will add to the Java classpath before executing your script. Multiple values must be complete paths separated by a comma (,).

    • --extra-files  —  S3 path(s) to additional files such as configuration files that AWS Glue will copy to the working directory of your script before executing it. Multiple values must be complete paths separated by a comma (,).

    • --job-bookmark-option  —  When this argument is present, bookmarking is enabled, so that a JobRun starts from where the last one left off.

    • --TempDir  —  Specifies an S3 path to a bucket that can be used as a temporary directory for the Job.

    There are several argument names used by AWS Glue internally that you should never set:

    • --conf  —  Internal to AWS Glue. Do not set!

    • --debug  —  Internal to AWS Glue. Do not set!

    • --mode  —  Internal to AWS Glue. Do not set!

    • --JOB_NAME  —  Internal to AWS Glue. Do not set!

  • Connections – A ConnectionsList object.

    The connections used for this job.

  • MaxRetries – Number (integer).

    The maximum number of times to retry this job if it fails.

  • AllocatedCapacity – Number (integer).

    The number of AWS Glue data processing units (DPUs) to allocate to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

Operations

CreateJob Action (Python: create_job)

Creates a new job.

Request

  • Name – String, matching the Single-line string pattern. Required.

    The name you assign to this job. It must be unique in your account.

  • Description – Description string, matching the URI address multi-line string pattern.

    Description of the job.

  • LogUri – String.

    This field is reserved for future use.

  • Role – String. Required.

    The name of the IAM role associated with this job.

  • ExecutionProperty – An ExecutionProperty object.

    An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

  • Command – A JobCommand object. Required.

    The JobCommand that executes this job.

  • DefaultArguments – An array of UTF-8 string–to–UTF-8 string mappings.

    The default arguments for this job.

    You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

    For information about how to specify and consume your own Job arguments, see the Developer Guide Python programming topic.

    AWS Glue consumes the following arguments to set up the Job script environment:

    • --scriptLocation  —  The S3 location where your ETL script is located (in a form like s3://path/to/my/script.py). This overrides a script location set in the JobCommand object.

    • --extra-py-files  —  S3 path(s) to additional Python modules that AWS Glue will add to the Python path before executing your script. Multiple values must be complete paths separated by a comma (,). Note that only pure Python modules will work currently. Extension modules written in C or other languages are not supported.

    • --extra-jars  —  S3 path(s) to additional Java .jar file(s) that AWS Glue will add to the Java classpath before executing your script. Multiple values must be complete paths separated by a comma (,).

    • --extra-files  —  S3 path(s) to additional files such as configuration files that AWS Glue will copy to the working directory of your script before executing it. Multiple values must be complete paths separated by a comma (,).

    • --job-bookmark-option  —  When this argument is present, bookmarking is enabled, so that a JobRun starts from where the last one left off.

    • --TempDir  —  Specifies an S3 path to a bucket that can be used as a temporary directory for the Job.

    There are several argument names used by AWS Glue internally that you should never set:

    • --conf  —  Internal to AWS Glue. Do not set!

    • --debug  —  Internal to AWS Glue. Do not set!

    • --mode  —  Internal to AWS Glue. Do not set!

    • --JOB_NAME  —  Internal to AWS Glue. Do not set!

  • Connections – A ConnectionsList object.

    The connections used for this job.

  • MaxRetries – Number (integer).

    The maximum number of times to retry this job if it fails.

  • AllocatedCapacity – Number (integer).

    The number of AWS Glue data processing units (DPUs) to allocate to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

Response

Errors

  • InvalidInputException

  • IdempotentParameterMismatchException

  • AlreadyExistsException

  • InternalServiceException

  • OperationTimeoutException

  • ResourceNumberLimitExceededException

  • ConcurrentModificationException

UpdateJob Action (Python: update_job)

Updates an existing job definition.

Request

  • JobName – String, matching the Single-line string pattern. Required.

    Name of the job definition to update.

  • JobUpdate – A JobUpdate object. Required.

    Specifies the values with which to update the job.

Response

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

  • ConcurrentModificationException

GetJob Action (Python: get_job)

Retrieves an existing job definition.

Request

Response

  • Job – A Job object.

    The requested job definition.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

GetJobs Action (Python: get_jobs)

Retrieves all current jobs.

Request

  • NextToken – String.

    A continuation token, if this is a continuation call.

  • MaxResults – Number (integer).

    The maximum size of the response.

Response

  • Jobs – An array of Jobs.

    A list of jobs.

  • NextToken – String.

    A continuation token, if not all jobs have yet been returned.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

DeleteJob Action (Python: delete_job)

Deletes a specified job. If the job is not found, no exception is thrown.

Request

Response

Errors

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException