モニタリングジョブのスケジューリング

Amazon SageMaker Model Monitor には、リアルタイムエンドポイントから収集されたデータをモニタリングする機能があります。データを定期的なスケジュールでモニタリングすることも、1 回だけ即時にモニタリングすることもできます。CreateMonitoringSchedule API を使用してモニタリングスケジュールを作成できます。

モニタリングスケジュールを使用すると、SageMaker AI はジョブの処理を開始して、特定の期間中に収集されたデータを分析できます。処理ジョブでは、SageMaker AI は現在の分析のデータセットを、指定したベースライン統計と制約と比較します。次に、SageMaker AI は違反レポートを生成します。さらに、分析中の各特徴量について CloudWatch メトリクスが出力されます。

SageMaker AI は、表形式のデータセットで分析を実行するための構築済みコンテナを提供します。または、「Amazon SageMaker Model Monitor を使用した独自のコンテナのサポート」のトピックで概説されているように、独自のコンテナを持ち込むこともできます。

リアルタイムエンドポイントまたはバッチ変換ジョブのモデルモニタリングスケジュールを作成できます。ベースラインリソース (制約および統計) を使用して、リアルタイムトラフィックまたはバッチジョブと比較します。

例ベースライン割り当て

次の例では、モデルのトレーニングに使用されたトレーニングデータセットが Amazon S3 にアップロードされました。データセットが Amazon S3 にすでに存在する場合は、それを直接指定できます。


# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)
baseline_prefix = prefix + '/baselining'
baseline_data_prefix = baseline_prefix + '/data'
baseline_results_prefix = baseline_prefix + '/results'

baseline_data_uri = 's3://{}/{}'.format(bucket,baseline_data_prefix)
baseline_results_uri = 's3://{}/{}'.format(bucket, baseline_results_prefix)
print('Baseline data uri: {}'.format(baseline_data_uri))
print('Baseline results uri: {}'.format(baseline_results_uri))


training_data_file = open("test_data/training-dataset-with-header.csv", 'rb')
s3_key = os.path.join(baseline_prefix, 'data', 'training-dataset-with-header.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(s3_key).upload_fileobj(training_data_file)

例定期分析のスケジュール

リアルタイムエンドポイントのモデルモニタリングをスケジュールする場合は、ベースラインの制約と統計を使用してリアルタイムのトラフィックと比較します。次のコードスニペットは、リアルタイムエンドポイントのモデルモニターをスケジュールするために使用する一般的な形式を示しています。この例では、モデルモニターを 1 時間ごとに実行するようにスケジュールしています。


from sagemaker.model_monitor import CronExpressionGenerator
from time import gmtime, strftime

mon_schedule_name = 'my-model-monitor-schedule-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    endpoint_input=EndpointInput(
        endpoint_name=endpoint_name,
        destination="/opt/ml/processing/input/endpoint"
    ),
    post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

例 1 回限りの分析のスケジュール

create_monitoring_schedule メソッドに次のような引数を渡すことで、分析を繰り返し実行せずに 1 回実行するようにスケジュールすることもできます。


    schedule_cron_expression=CronExpressionGenerator.now(),
    data_analysis_start_time="-PT1H",
    data_analysis_end_time="-PT0H",

これらの引数では、schedule_cron_expression パラメータは値 CronExpressionGenerator.now() を使用して、分析が 1 回だけ即時に実行されるようにスケジュールします。この設定を使用するどのスケジュールでも、data_analysis_start_time と data_analysis_end_time のパラメータは必須です。これらのパラメータは、分析時間枠の開始時間と終了時間を設定します。これらの時間を現在の時刻を基準にしたオフセットとして定義し、ISO 8601 の期間形式を使用します。この例では、時間 -PT1H と -PT0H を定義し、過去 1 時間から現在の時刻までの時間枠を定義します。このスケジュールでは、分析は指定された時間枠に収集されたデータのみを評価します。

例バッチ変換ジョブのスケジュール

次のコードスニペットは、バッチ変換ジョブのモデルモニターをスケジュールするために使用する一般的な形式を示しています。


from sagemaker.model_monitor import (
    CronExpressionGenerator,
    BatchTransformInput, 
    MonitoringDatasetFormat, 
)
from time import gmtime, strftime

mon_schedule_name = 'my-model-monitor-schedule-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
my_default_monitor.create_monitoring_schedule(
    monitor_schedule_name=mon_schedule_name,
    batch_transform_input=BatchTransformInput(
        destination="opt/ml/processing/input",
        data_captured_destination_s3_uri=s3_capture_upload_path,
        dataset_format=MonitoringDatasetFormat.csv(header=False),
    ),
    post_analytics_processor_script=s3_code_postprocessor_uri,
    output_s3_uri=s3_report_path,
    statistics=my_default_monitor.baseline_statistics(),
    constraints=my_default_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)


desc_schedule_result = my_default_monitor.describe_schedule()
print('Schedule status: {}'.format(desc_schedule_result['MonitoringScheduleStatus']))

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

特徴量ドリフト分析のための CloudWatch メトリクス

cron のスケジューリング

モニタリングジョブのスケジューリング

例 ベースライン割り当て

例 定期分析のスケジュール

例 1 回限りの分析のスケジュール

例 バッチ変換ジョブのスケジュール

例ベースライン割り当て

例定期分析のスケジュール

例バッチ変換ジョブのスケジュール