Amazon SageMaker
Developer Guide

DescribeTrainingJob

Returns information about a training job.

Request Syntax

{ "TrainingJobName": "string" }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

TrainingJobName

The name of the training job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*

Required: Yes

Response Syntax

{ "AlgorithmSpecification": { "TrainingImage": "string", "TrainingInputMode": "string" }, "CreationTime": number, "FailureReason": "string", "HyperParameters": { "string" : "string" }, "InputDataConfig": [ { "ChannelName": "string", "CompressionType": "string", "ContentType": "string", "DataSource": { "S3DataSource": { "S3DataDistributionType": "string", "S3DataType": "string", "S3Uri": "string" } }, "RecordWrapperType": "string" } ], "LastModifiedTime": number, "ModelArtifacts": { "S3ModelArtifacts": "string" }, "OutputDataConfig": { "KmsKeyId": "string", "S3OutputPath": "string" }, "ResourceConfig": { "InstanceCount": number, "InstanceType": "string", "VolumeKmsKeyId": "string", "VolumeSizeInGB": number }, "RoleArn": "string", "SecondaryStatus": "string", "SecondaryStatusTransitions": [ { "EndTime": number, "StartTime": number, "Status": "string", "StatusMessage": "string" } ], "StoppingCondition": { "MaxRuntimeInSeconds": number }, "TrainingEndTime": number, "TrainingJobArn": "string", "TrainingJobName": "string", "TrainingJobStatus": "string", "TrainingStartTime": number, "TuningJobArn": "string", "VpcConfig": { "SecurityGroupIds": [ "string" ], "Subnets": [ "string" ] } }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

AlgorithmSpecification

Information about the algorithm used for training, and algorithm metadata.

Type: AlgorithmSpecification object

CreationTime

A timestamp that indicates when the training job was created.

Type: Timestamp

FailureReason

If the training job failed, the reason it failed.

Type: String

Length Constraints: Maximum length of 1024.

HyperParameters

Algorithm-specific parameters.

Type: String to string map

Key Length Constraints: Maximum length of 256.

Value Length Constraints: Maximum length of 256.

InputDataConfig

An array of Channel objects that describes each data input channel.

Type: Array of Channel objects

Array Members: Minimum number of 1 item. Maximum number of 8 items.

LastModifiedTime

A timestamp that indicates when the status of the training job was last modified.

Type: Timestamp

ModelArtifacts

Information about the Amazon S3 location that is configured for storing model artifacts.

Type: ModelArtifacts object

OutputDataConfig

The S3 path where model artifacts that you configured when creating the job are stored. Amazon SageMaker creates subfolders for model artifacts.

Type: OutputDataConfig object

ResourceConfig

Resources, including ML compute instances and ML storage volumes, that are configured for model training.

Type: ResourceConfig object

RoleArn

The AWS Identity and Access Management (IAM) role configured for the training job.

Type: String

Length Constraints: Minimum length of 20. Maximum length of 2048.

Pattern: ^arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$

SecondaryStatus

Provides granular information about the system state. For more information, see TrainingJobStatus. For detailed information and the immediate status of the training job, see StatusMessage under SecondaryStatusTransition.

  • Starting - starting the training job.

  • Downloading - downloading the input data.

  • Training - model training is in progress.

  • Uploading - uploading the trained model.

  • Stopping - stopping the training job.

  • Stopped - the training job has stopped.

  • MaxRuntimeExceeded - the training job exceeded the specified max run time and has been stopped.

  • Completed - the training job has completed.

  • Failed - the training job has failed. The failure reason is stored in the FailureReason field of DescribeTrainingJobResponse.

Important

The valid values for SecondaryStatus are subject to change. They primarily provide information on the progress of the training job.

Type: String

Valid Values: Starting | LaunchingMLInstances | PreparingTrainingStack | Downloading | DownloadingTrainingImage | Training | Uploading | Stopping | Stopped | MaxRuntimeExceeded | Completed | Failed

SecondaryStatusTransitions

To give an overview of the training job lifecycle, SecondaryStatusTransitions is a log of time-ordered secondary statuses that a training job has transitioned.

Type: Array of SecondaryStatusTransition objects

StoppingCondition

The condition under which to stop the training job.

Type: StoppingCondition object

TrainingEndTime

Indicates the time when the training job ends on training instances. You are billed for the time interval between the value of TrainingStartTime and this time. For successful jobs and stopped jobs, this is the time after model artifacts are uploaded. For failed jobs, this is the time when Amazon SageMaker detects a job failure.

Type: Timestamp

TrainingJobArn

The Amazon Resource Name (ARN) of the training job.

Type: String

Length Constraints: Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:training-job/.*

TrainingJobName

Name of the model training job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*

TrainingJobStatus

The status of the training job. For more granular information, see SecondaryStatus.

For the InProgress status, Amazon SageMaker can return these secondary statuses:

  • Starting - Preparing for training.

  • Downloading - Optional stage for algorithms that support File training input mode. It indicates data is being downloaded to ML storage volumes.

  • Training - Training is in progress.

  • Uploading - Training is complete and model upload is in progress.

For the Stopped training status, Amazon SageMaker can return these secondary statuses:

  • MaxRuntimeExceeded - Job stopped as a result of maximum allowed runtime exceeded.

Type: String

Valid Values: InProgress | Completed | Failed | Stopping | Stopped

TrainingStartTime

Indicates the time when the training job starts on training instances. You are billed for the time interval between this time and the value of TrainingEndTime. The start time in CloudWatch Logs might be later than this time. The difference is due to the time it takes to download the training data and to the size of the training container.

Type: Timestamp

TuningJobArn

The Amazon Resource Name (ARN) of the associated hyperparameter tuning job if the training job was launched by a hyperparameter tuning job.

Type: String

Length Constraints: Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:hyper-parameter-tuning-job/.*

VpcConfig

A VpcConfig object that specifies the VPC that this training job has access to. For more information, see Protect Training Jobs by Using an Amazon Virtual Private Cloud.

Type: VpcConfig object

Errors

For information about the errors that are common to all actions, see Common Errors.

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: