When you submit a job to an AWS Batch job queue, the job enters the SUBMITTED
state. It then passes
through the following states until it succeeds (exits with code 0
) or fails (exits with a non-zero code).
AWS Batch jobs can have the following states:
SUBMITTED
-
A job that's submitted to the queue, and has not yet been evaluated by the scheduler. The scheduler evaluates the job to determine if it has any outstanding dependencies on the successful completion of any other jobs. If there are dependencies, the job is moved to
PENDING
. If there are no dependencies, the job is moved toRUNNABLE
. PENDING
-
A job that resides in the queue and isn't yet able to run due to a dependency on another job or resource. After the dependencies are satisfied, the job is moved to
RUNNABLE
. RUNNABLE
-
A job that resides in the queue, has no outstanding dependencies, and is therefore ready to be scheduled to a host. Jobs in this state are started as soon as sufficient resources are available in one of the compute environments that are mapped to the job's queue. However, jobs can remain in this state indefinitely when sufficient resources are unavailable.
Note
If your jobs don't progress to
STARTING
, see Jobs stuck in a RUNNABLE status in the troubleshooting section. STARTING
-
These jobs have been scheduled to a host and the relevant container initiation operations are underway. After the container image is pulled and the container is up and running, the job transitions to
RUNNING
.Image pull duration, Amazon EKS initContainer completion duration, and Amazon ECS containerDependency resolution duration occur in the STARTING state. The amount of time it takes to pull an image for your job is equivalant to the amout of time your job will be in the STARTING state.
For example, if it takes three minutes to pull the image for your job, your job will be in the STARTING state for three minutes. If initContainers takes a total of ten minutes to complete, then your Amazon EKS job will be in STARTING for ten minutes. If you have Amazon ECS containerDependencies sets in your Amazon ECS job, the job will be in STARTING until all container dependencies (their runtime) are resolved. STARTING is not included in timeouts; duration starts at RUNNING. For more information, see Job states.
RUNNING
-
The job is running as a container job on an Amazon ECS container instance within a compute environment. When the job's container exits, the process exit code determines whether the job succeeded or failed. An exit code of
0
indicates success, and any non-zero exit code indicates failure. If the job associated with a failed attempt has any remaining attempts left in its optional retry strategy configuration, the job is moved toRUNNABLE
again. For more information, see Automated job retries.Note
Logs for
RUNNING
jobs are available in CloudWatch Logs. The log group is/aws/batch/job
, and the log stream name format is as follows:
. This format might change in the future.first200CharsOfJobDefinitionName
/default/ecs_task_id
After a job reaches the
RUNNING
status, you can programmatically retrieve its log stream name with the DescribeJobs API operation. For more information, see View Log Data Sent to CloudWatch Logs in the Amazon CloudWatch Logs User Guide. By default, these logs never expire. However, you can modify the retention period. For more information, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide. SUCCEEDED
-
The job has successfully completed with an exit code of
0
. The job state forSUCCEEDED
jobs is persisted in AWS Batch for at least 7 days.Note
Logs for
SUCCEEDED
jobs are available in CloudWatch Logs. The log group is/aws/batch/job
, and the log stream name format is as follows:
. This format may change in the future.first200CharsOfJobDefinitionName
/default/ecs_task_id
After a job reaches the
RUNNING
status, you can programmatically retrieve its log stream name with the DescribeJobs API operation. For more information, see View Log Data Sent to CloudWatch Logs in the Amazon CloudWatch Logs User Guide. By default, these logs never expires. However, you can modify the retention period. For more information, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide. FAILED
-
The job has failed all available attempts. The job state for
FAILED
jobs is persisted in AWS Batch for at least 7 days.Note
Logs for
FAILED
jobs are available in CloudWatch Logs. The log group is/aws/batch/job
, and the log stream name format is as follows:
. This format may change in the future.first200CharsOfJobDefinitionName
/default/ecs_task_id
After a job reaches the
RUNNING
status, you can programmatically retrieve its log stream with the DescribeJobs API operation. For more information, see View Log Data Sent to CloudWatch Logs in the Amazon CloudWatch Logs User Guide. By default, these logs never expire. However, you can modify the retention period. For more information, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide.