Menu
AWS Glue
Developer Guide

Job Runs

Data Types

JobRun Structure

Contains information about a job run.

Fields

  • Id – String, matching the Single-line string pattern.

    The ID of this job run.

  • Attempt – Number (integer).

    The number of the attempt to run this job.

  • PreviousRunId – String, matching the Single-line string pattern.

    The ID of the previous run of this job. For example, the JobRunId specified in the StartJobRun action.

  • TriggerName – String, matching the Single-line string pattern.

    The name of the trigger that started this job run.

  • JobName – String, matching the Single-line string pattern.

    The name of the job being run.

  • StartedOn – Timestamp.

    The date and time at which this job run was started.

  • LastModifiedOn – Timestamp.

    The last time this job run was modified.

  • CompletedOn – Timestamp.

    The date and time this job run completed.

  • JobRunState – String (valid values: STARTING | RUNNING | STOPPING | STOPPED | SUCCEEDED | FAILED).

    The current state of the job run.

  • Arguments – An array of UTF-8 string–to–UTF-8 string mappings.

    The job arguments associated with this run. These override equivalent default arguments set for the job.

    You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

    For information about how to specify and consume your own Job arguments, see the Developer Guide Python programming topic.

    AWS Glue consumes the following arguments to set up the Job script environment:

    • --scriptLocation  —  The S3 location where your ETL script is located (in a form like s3://path/to/my/script.py). This overrides a script location set in the JobCommand object.

    • --extra-py-files  —  S3 path(s) to additional Python modules that AWS Glue will add to the Python path before executing your script. Multiple values must be complete paths separated by a comma (,). Note that only pure Python modules will work currently. Extension modules written in C or other languages are not supported.

    • --extra-jars  —  S3 path(s) to additional Java .jar file(s) that AWS Glue will add to the Java classpath before executing your script. Multiple values must be complete paths separated by a comma (,).

    • --extra-files  —  S3 path(s) to additional files such as configuration files that AWS Glue will copy to the working directory of your script before executing it. Multiple values must be complete paths separated by a comma (,).

    • --job-bookmark-option  —  When this argument is present, bookmarking is enabled, so that a JobRun starts from where the last one left off.

    • --TempDir  —  Specifies an S3 path to a bucket that can be used as a temporary directory for the Job.

    There are several argument names used by AWS Glue internally that you should never set:

    • --conf  —  Internal to AWS Glue. Do not set!

    • --debug  —  Internal to AWS Glue. Do not set!

    • --mode  —  Internal to AWS Glue. Do not set!

    • --JOB_NAME  —  Internal to AWS Glue. Do not set!

  • ErrorMessage – String.

    An error message associated with this job run.

  • PredecessorRuns – An array of Predecessors.

    A list of predecessors to this job run.

  • AllocatedCapacity – Number (integer).

    The number of AWS Glue data processing units (DPUs) allocated to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

Predecessor Structure

A job run that was used in the predicate of a conditional trigger that triggered this job run.

Fields

JobBookmarkEntry Structure

Defines a point which a job can resume processing.

Fields

  • JobName – String.

    Name of the job in question.

  • Version – Number (integer).

    Version of the job.

  • Run – Number (integer).

    The run ID number.

  • Attempt – Number (integer).

    The attempt ID number.

  • JobBookmark – String.

    The bookmark itself.

BatchStopJobRunSuccessfulSubmission Structure

Records a successful request to stop a specified JobRun.

Fields

BatchStopJobRunError Structure

Records an error that occurred when attempting to stop a specified JobRun.

Fields

  • JobName – String, matching the Single-line string pattern.

    The name of the Job in question.

  • JobRunId – String, matching the Single-line string pattern.

    The JobRunId of the JobRun in question.

  • ErrorDetail – An ErrorDetail object.

    Specifies details about the error that was encountered.

Operations

StartJobRun Action (Python: start_job_run)

Runs a job.

Request

  • JobName – String, matching the Single-line string pattern. Required.

    The name of the job to start.

  • JobRunId – String, matching the Single-line string pattern.

    The ID of a previous JobRun to retry.

  • Arguments – An array of UTF-8 string–to–UTF-8 string mappings.

    The job arguments specifically for this run. They override the equivalent default arguments set for the job itself.

    You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

    For information about how to specify and consume your own Job arguments, see the Developer Guide Python programming topic.

    AWS Glue consumes the following arguments to set up the Job script environment:

    • --scriptLocation  —  The S3 location where your ETL script is located (in a form like s3://path/to/my/script.py). This overrides a script location set in the JobCommand object.

    • --extra-py-files  —  S3 path(s) to additional Python modules that AWS Glue will add to the Python path before executing your script. Multiple values must be complete paths separated by a comma (,). Note that only pure Python modules will work currently. Extension modules written in C or other languages are not supported.

    • --extra-jars  —  S3 path(s) to additional Java .jar file(s) that AWS Glue will add to the Java classpath before executing your script. Multiple values must be complete paths separated by a comma (,).

    • --extra-files  —  S3 path(s) to additional files such as configuration files that AWS Glue will copy to the working directory of your script before executing it. Multiple values must be complete paths separated by a comma (,).

    • --job-bookmark-option  —  When this argument is present, bookmarking is enabled, so that a JobRun starts from where the last one left off.

    • --TempDir  —  Specifies an S3 path to a bucket that can be used as a temporary directory for the Job.

    There are several argument names used by AWS Glue internally that you should never set:

    • --conf  —  Internal to AWS Glue. Do not set!

    • --debug  —  Internal to AWS Glue. Do not set!

    • --mode  —  Internal to AWS Glue. Do not set!

    • --JOB_NAME  —  Internal to AWS Glue. Do not set!

  • AllocatedCapacity – Number (integer).

    The number of AWS Glue data processing units (DPUs) to allocate to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

Response

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

  • ResourceNumberLimitExceededException

  • ConcurrentRunsExceededException

BatchStopJobRun Action (Python: batch_stop_job_run)

Stops one or more job runs for a specified Job.

Request

  • JobName – String, matching the Single-line string pattern. Required.

    The name of the Job in question.

  • JobRunIds – An array of UTF-8 strings. Required.

    A list of the JobRunIds that should be stopped for that Job.

Response

  • SuccessfulSubmissions – An array of BatchStopJobRunSuccessfulSubmissions.

    A list of the JobRuns that were successfully submitted for stopping.

  • Errors – An array of BatchStopJobRunErrors.

    A list of the errors that were encountered in tryng to stop JobRuns, including the JobRunId for which each error was encountered and details about the error.

Errors

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException

GetJobRun Action (Python: get_job_run)

Retrieves the metadata for a given job run.

Request

  • JobName – String, matching the Single-line string pattern. Required.

    Name of the job being run.

  • RunId – String, matching the Single-line string pattern. Required.

    The ID of the job run.

  • PredecessorsIncluded – Boolean.

    True if a list of predecessor runs should be returned.

Response

  • JobRun – A JobRun object.

    The requested job-run metadata.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

GetJobRuns Action (Python: get_job_runs)

Retrieves metadata for all runs of a given job.

Request

  • JobName – String, matching the Single-line string pattern. Required.

    The name of the job for which to retrieve all job runs.

  • NextToken – String.

    A continuation token, if this is a continuation call.

  • MaxResults – Number (integer).

    The maximum size of the response.

Response

  • JobRuns – An array of JobRuns.

    A list of job-run metatdata objects.

  • NextToken – String.

    A continuation token, if not all reequested job runs have been returned.

Errors

  • InvalidInputException

  • EntityNotFoundException

  • InternalServiceException

  • OperationTimeoutException

ResetJobBookmark Action (Python: reset_job_bookmark)

Resets a bookmark entry.

Request

  • JobName – String. Required.

    The name of the job in question.

Response

  • JobBookmarkEntry – A JobBookmarkEntry object.

    The reset bookmark entry.

Errors

  • EntityNotFoundException

  • InvalidInputException

  • InternalServiceException

  • OperationTimeoutException