Amazon SageMaker
Developer Guide

DescribeTrainingJob

Returns information about a training job.

Request Syntax

{ "TrainingJobName": "string" }

Request Parameters

For information about the parameters that are common to all actions, see Common Parameters.

The request accepts the following data in JSON format.

TrainingJobName

The name of the training job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*

Required: Yes

Response Syntax

{ "AlgorithmSpecification": { "AlgorithmName": "string", "MetricDefinitions": [ { "Name": "string", "Regex": "string" } ], "TrainingImage": "string", "TrainingInputMode": "string" }, "CreationTime": number, "EnableNetworkIsolation": boolean, "FailureReason": "string", "FinalMetricDataList": [ { "MetricName": "string", "Timestamp": number, "Value": number } ], "HyperParameters": { "string" : "string" }, "InputDataConfig": [ { "ChannelName": "string", "CompressionType": "string", "ContentType": "string", "DataSource": { "S3DataSource": { "S3DataDistributionType": "string", "S3DataType": "string", "S3Uri": "string" } }, "InputMode": "string", "RecordWrapperType": "string" } ], "LabelingJobArn": "string", "LastModifiedTime": number, "ModelArtifacts": { "S3ModelArtifacts": "string" }, "OutputDataConfig": { "KmsKeyId": "string", "S3OutputPath": "string" }, "ResourceConfig": { "InstanceCount": number, "InstanceType": "string", "VolumeKmsKeyId": "string", "VolumeSizeInGB": number }, "RoleArn": "string", "SecondaryStatus": "string", "SecondaryStatusTransitions": [ { "EndTime": number, "StartTime": number, "Status": "string", "StatusMessage": "string" } ], "StoppingCondition": { "MaxRuntimeInSeconds": number }, "TrainingEndTime": number, "TrainingJobArn": "string", "TrainingJobName": "string", "TrainingJobStatus": "string", "TrainingStartTime": number, "TuningJobArn": "string", "VpcConfig": { "SecurityGroupIds": [ "string" ], "Subnets": [ "string" ] } }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

AlgorithmSpecification

Information about the algorithm used for training, and algorithm metadata.

Type: AlgorithmSpecification object

CreationTime

A timestamp that indicates when the training job was created.

Type: Timestamp

EnableNetworkIsolation

Type: Boolean

FailureReason

If the training job failed, the reason it failed.

Type: String

Length Constraints: Maximum length of 1024.

FinalMetricDataList

A collection of MetricData objects that specify the names, values, and dates and times that the training algorithm emitted to Amazon CloudWatch.

Type: Array of MetricData objects

Array Members: Minimum number of 0 items. Maximum number of 20 items.

HyperParameters

Algorithm-specific parameters.

Type: String to string map

Key Length Constraints: Maximum length of 256.

Value Length Constraints: Maximum length of 256.

InputDataConfig

An array of Channel objects that describes each data input channel.

Type: Array of Channel objects

Array Members: Minimum number of 1 item. Maximum number of 8 items.

LabelingJobArn

Type: String

Length Constraints: Maximum length of 2048.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:labeling-job/.*

LastModifiedTime

A timestamp that indicates when the status of the training job was last modified.

Type: Timestamp

ModelArtifacts

Information about the Amazon S3 location that is configured for storing model artifacts.

Type: ModelArtifacts object

OutputDataConfig

The S3 path where model artifacts that you configured when creating the job are stored. Amazon SageMaker creates subfolders for model artifacts.

Type: OutputDataConfig object

ResourceConfig

Resources, including ML compute instances and ML storage volumes, that are configured for model training.

Type: ResourceConfig object

RoleArn

The AWS Identity and Access Management (IAM) role configured for the training job.

Type: String

Length Constraints: Minimum length of 20. Maximum length of 2048.

Pattern: ^arn:aws[a-z\-]*:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+$

SecondaryStatus

Provides detailed information about the state of the training job. For detailed information on the secondary status of the training job, see StatusMessage under SecondaryStatusTransition.

Amazon SageMaker provides primary statuses and secondary statuses that apply to each of them:

InProgress
  • Starting - Starting the training job.

  • Downloading - An optional stage for algorithms that support File training input mode. It indicates that data is being downloaded to the ML storage volumes.

  • Training - Training is in progress.

  • Uploading - Training is complete and the model artifacts are being uploaded to the S3 location.

Completed
  • Completed - The training job has completed.

Failed
  • Failed - The training job has failed. The reason for the failure is returned in the FailureReason field of DescribeTrainingJobResponse.

Stopped
  • MaxRuntimeExceeded - The job stopped because it exceeded the maximum allowed runtime.

  • Stopped - The training job has stopped.

Stopping
  • Stopping - Stopping the training job.

Important

Valid values for SecondaryStatus are subject to change.

We no longer support the following secondary statuses:

  • LaunchingMLInstances

  • PreparingTrainingStack

  • DownloadingTrainingImage

Type: String

Valid Values: Starting | LaunchingMLInstances | PreparingTrainingStack | Downloading | DownloadingTrainingImage | Training | Uploading | Stopping | Stopped | MaxRuntimeExceeded | Completed | Failed

SecondaryStatusTransitions

A history of all of the secondary statuses that the training job has transitioned through.

Type: Array of SecondaryStatusTransition objects

StoppingCondition

The condition under which to stop the training job.

Type: StoppingCondition object

TrainingEndTime

Indicates the time when the training job ends on training instances. You are billed for the time interval between the value of TrainingStartTime and this time. For successful jobs and stopped jobs, this is the time after model artifacts are uploaded. For failed jobs, this is the time when Amazon SageMaker detects a job failure.

Type: Timestamp

TrainingJobArn

The Amazon Resource Name (ARN) of the training job.

Type: String

Length Constraints: Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:training-job/.*

TrainingJobName

Name of the model training job.

Type: String

Length Constraints: Minimum length of 1. Maximum length of 63.

Pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9])*

TrainingJobStatus

The status of the training job.

Amazon SageMaker provides the following training job statuses:

  • InProgress - The training is in progress.

  • Completed - The training job has completed.

  • Failed - The training job has failed. To see the reason for the failure, see the FailureReason field in the response to a DescribeTrainingJobResponse call.

  • Stopping - The training job is stopping.

  • Stopped - The training job has stopped.

For more detailed information, see SecondaryStatus.

Type: String

Valid Values: InProgress | Completed | Failed | Stopping | Stopped

TrainingStartTime

Indicates the time when the training job starts on training instances. You are billed for the time interval between this time and the value of TrainingEndTime. The start time in CloudWatch Logs might be later than this time. The difference is due to the time it takes to download the training data and to the size of the training container.

Type: Timestamp

TuningJobArn

The Amazon Resource Name (ARN) of the associated hyperparameter tuning job if the training job was launched by a hyperparameter tuning job.

Type: String

Length Constraints: Maximum length of 256.

Pattern: arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:hyper-parameter-tuning-job/.*

VpcConfig

A VpcConfig object that specifies the VPC that this training job has access to. For more information, see Protect Training Jobs by Using an Amazon Virtual Private Cloud.

Type: VpcConfig object

Errors

For information about the errors that are common to all actions, see Common Errors.

ResourceNotFound

Resource being access is not found.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: