系統エンティティをクエリする

Amazon SageMaker AI は、使用時に系統エンティティのグラフを自動的に生成します。このデータをクエリすると、さまざまな質問に答えることができます。以下では、SDK for Python でこのデータをクエリする手順について説明します。

Amazon SageMaker Studio で登録済みモデルのリネージを表示する方法については、「Studio でモデルリネージの詳細を表示する」を参照してください。

系統エンティティをクエリして、以下の操作を実行できます。

モデルの作成に送られたすべてのデータセットを取得する。
エンドポイントの作成に送られたすべてのジョブを取得する。
データセットを使用するすべてのモデルを取得する。
モデルを使用するすべてのエンドポイントを取得する。
特定のデータセットから派生したエンドポイントを取得する。
トレーニングジョブを作成したパイプラインの実行を取得する。
調査、ガバナンス、再現性のためにエンティティ間の関係を取得する。
アーティファクトを使用するすべてのダウンストリーム試行を取得する。
アーティファクトを使用するすべてのアップストリーム試行を取得する。
提供された S3 URI を使用するアーティファクトのリストを取得する。
データセットアーティファクトを使用するアップストリームアーティファクトを取得する。
データセットアーティファクトを使用するダウンストリームアーティファクトを取得する。
イメージアーティファクトを使用するデータセットを取得する。
コンテキストを使用するアクションを取得する。
エンドポイントを使用する処理ジョブを取得する。
エンドポイントを使用する変換ジョブを取得する。
エンドポイントを使用するトライアルコンポーネントを取得する。
モデルパッケージグループに関連付けられたパイプライン実行の ARN を取得する。
アクションを使用するすべてのアーティファクトを取得する。
モデルパッケージの承認アクションを使用するすべてのアップストリームデータセットを取得する。
モデルパッケージの承認アクションからモデルパッケージを取得する。
エンドポイントを使用するダウンストリームエンドポイントコンテキストを取得する。
トライアルコンポーネントに関連付けられたパイプライン実行の ARN を取得する。
トライアルコンポーネントを使用するデータセットを取得する。
トライアルコンポーネントを使用するモデルを取得する。
可視化のために系統を調べる。

制限

以下のリージョンでは、系統クエリは使用できません。
- アフリカ (ケープタウン) - af-south
- アジアパシフィック (ジャカルタ) – ap-southeast-3
- アジアパシフィック (大阪) - ap-northeast-3
- 欧州 (ミラノ) - eu-south-1
- 欧州 (スペイン) eu-south-2
- イスラエル (テルアビブ) – il-central-1
現在、検出する関係の最大深度は 10 に制限されています。
フィルタリングは、最終更新日、作成日、タイプ、系統エンティティタイプのプロパティに限定されます。

トピック

系統エンティティのクエリの開始方法

系統エンティティのクエリの開始方法

最も簡単な開始方法は、以下のいずれかです。

Amazon SageMaker AI SDK for Python は、多くの一般的なユースケースを定義しています。
SageMaker AI Lineage APIssagemaker-lineage-multihop-queries.ipynb」を参照してください。

以下の例は、LineageQuery と LineageFilter の API を使用して、系統グラフに関する質問に答え、いくつかのユースケースのエンティティ関係を抽出するためのクエリを構築する方法を示しています。

例 `LineageQuery` API を使用して、エンティティの関連付けを見つける


from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)

例エンドポイントに関連付けられているすべてのデータセットを見つける


# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)

例エンドポイントに関連付けられているモデルを見つける


# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)

例エンドポイントに関連付けられているトライアルコンポーネントを見つける


# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)

例系統の焦点を変更する

LineageQuery を変更して異なる start_arns を持たせると、系統の焦点を変更できます。また、LineageFilter で複数のソースとエンティティを取れば、クエリの範囲を拡大できます。

以下では、モデルを系統の焦点として使用し、それに関連付けられているエンドポイントとデータセットを見つけます。


# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)

例 `LineageQueryDirectionEnum.BOTH` を使用して祖先と子孫の関係を見つける

方向が BOTH に設定されている場合、クエリはグラフを横断して、祖先と子孫の関係を見つけます。この横断は、開始ノードだけでなく、訪問先の各ノードから行われます。例えば、トレーニングジョブが 2 回実行され、トレーニングジョブによって生成されたモデルの両方がエンドポイントにデプロイされている場合、方向が BOTH に設定されたクエリの結果には両方のエンドポイントが示されます。これは、モデルのトレーニングとデプロイに同じイメージが使用されているためです。イメージはモデルと共通であるため、start_arn と両方のエンドポイントがクエリ結果に表示されます。


query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)

例 `LineageQuery` の方向 (`ASCENDANTS` と `DESCENDANTS`)

系統グラフの方向を理解するために、次のエンティティ関係グラフを取得します ([Dataset] (データセット) -> [Training Job] (トレーニングジョブ) -> [Model] (モデル) -> [Endpoint] (エンドポイント))。

エンドポイントはモデルの子孫であり、モデルはデータセットの子孫です。同様に、モデルはエンドポイントの祖先です。direction パラメータを使用すると、start_arns のエンティティの子孫または祖先のエンティティをクエリで返すかどうかを指定できます。start_arns にモデルが含まれており、方向が DESCENDANTS の場合は、クエリはエンドポイントを返します。方向が ASCENDANTS の場合、クエリはデータセットを返します。


# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)

例系統クエリを簡単にする SDK ヘルパー関数

クラスEndpointContext、ModelArtifact、DatasetArtifact には、LineageQuery API のラッパーであるヘルパー関数が備わっており、特定の系統クエリが活用しやすくなっています。以下の例は、これらのヘルパー関数の使用方法を示しています。


# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)

例系統グラフの視覚化を取得する

ヘルパークラス Visualizer は、サンプルノートブック visualizer.py で提供され、系統グラフのプロットをサポートします。クエリレスポンスがレンダリングされると、StartArns の系統関係を合わせたグラフが表示されます。StartArns からの視覚化には、query_lineage API アクションで返されたその他の系統エンティティとの関係が示されます。


# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")
        
        query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")

ブラウザで JavaScript が無効になっているか、使用できません。

AWS ドキュメントを使用するには、JavaScript を有効にする必要があります。手順については、使用するブラウザのヘルプページを参照してください。

ドキュメントの表記規則

エンティティを手動で作成する

クロスアカウントのリネージを追跡する

系統エンティティをクエリする

制限

トピック

系統エンティティのクエリの開始方法

例 LineageQuery API を使用して、エンティティの関連付けを見つける

例 エンドポイントに関連付けられているすべてのデータセットを見つける

例 エンドポイントに関連付けられているモデルを見つける

例 エンドポイントに関連付けられているトライアルコンポーネントを見つける

例 系統の焦点を変更する

例 LineageQueryDirectionEnum.BOTH を使用して祖先と子孫の関係を見つける

例 LineageQuery の方向 (ASCENDANTS と DESCENDANTS)

例 系統クエリを簡単にする SDK ヘルパー関数

例 系統グラフの視覚化を取得する