查询世系实体

Amazon SageMaker AI 会在您使用世系实体时自动生成这些图表。您可以查询这些数据来回答各种问题。下面将说明如何在 SDK for Python 中查询这些数据。

有关如何在 Amazon SageMaker Studio 中查看注册模特血统的信息，请参阅在 Studio 中查看模型任务流水线详情。

您可以查询世系实体以执行以下操作：

检索创建模型时使用的所有数据集。
检索创建端点时使用的所有作业。
检索所有使用数据集的模型。
检索所有使用模型的端点。
检索哪些端点派生自特定数据集。
检索创建了训练作业的管道执行。
检索实体之间的关系，以便进行调查、治理和再现。
检索所有使用该构件的下游试验。
检索所有使用该构件的上游试验。
检索使用所提供的 S3 URI 的构件列表。
检索使用该数据集构件的上游构件。
检索使用该数据集构件的下游构件。
检索使用该映像构件的数据集。
检索使用该上下文的操作。
检索使用该端点的处理作业。
检索使用该端点的转换作业。
检索使用该端点的试验组件。
检索与模型包组关联的管道执行的 ARN。
检索所有使用该操作的构件。
检索所有使用该模型包批准操作的上游数据集。
从模型包批准操作中检索模型包。
检索使用该端点的下游端点上下文。
检索与试验组件关联的管道执行的 ARN。
检索使用该试验组件的数据集。
检索使用该试验组件的模型。
探索您的世系以实现可视化。

限制

以下区域不提供世系查询功能：
- 非洲（开普敦）- af-south
- 亚太地区（雅加达）– ap-southeast-3
- 亚太地区（大阪）– ap-northeast-3
- 欧洲地区（米兰）- eu-south-1
- 欧洲（西班牙）- eu-south-2
- 以色列（特拉维夫）– il-central-1
要发现的关系的最大深度目前限制为 10。
筛选仅限于以下属性：上次修改日期、创建日期、类型和世系实体类型。

主题

查询世系实体入门

查询世系实体入门

最简单的入门方式是通过：

适用于 Python 的亚马逊 SageMaker AI 开发工具包定义了许多常见用例。
有关演示如何使用 SageMaker AI Lineage 在谱系 APIs 图中查询关系的笔记本，请参阅 sagemaker-lineage-multihop-queries .ipynb。

以下示例说明如何使用LineageQuery和构造查询LineageFilter APIs 来回答有关 Lineage Graph 的问题，并针对一些用例提取实体关系。

例使用 `LineageQuery` API 查找实体关联


from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)

例查找与端点关联的所有数据集


# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)

例查找与端点关联的模型


# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)

例查找与端点关联的试验组件


# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)

例更改世系的焦点

可以修改 LineageQuery，使其具有不同的 start_arns，这将更改世系的焦点。此外，LineageFilter 可以采用多个来源和实体来扩大查询范围。

在下文中，我们使用模型作为世系焦点，并查找与之关联的端点和数据集。


# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)

例使用 `LineageQueryDirectionEnum.BOTH` 查找前代和后代关系

当方向设置为 BOTH 时，查询将遍历图表以查找前代和后代关系。这种遍历不仅从起始节点开始，而且从访问的每个节点开始。如果一个训练作业运行了两次，并且该训练作业生成的两个模型都部署到端点，则方向设置为 BOTH 的查询结果会显示这两个端点。这是因为训练和部署模型时使用的是同一映像。由于该映像对模型是通用的，因此 start_arn 和两个端点都会显示在查询结果中。


query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)

例 `LineageQuery` 中的方向 - `ASCENDANTS` 与 `DESCENDANTS`

要了解世系图表中的方向，请使用以下实体关系图表 - 数据集 -> 训练作业 -> 模型 -> 端点

端点是模型的后代，而模型是数据集的后代。同样，模型是端点的前代。direction 参数可用于指定查询应返回 start_arns 中实体的后代实体还是前代实体。如果 start_arns 包含模型且方向为 DESCENDANTS，则查询将返回端点。如果方向为 ASCENDANTS，则查询将返回数据集。


# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)

例可简化世系查询的 SDK 帮助程序函数

类 EndpointContext、ModelArtifact 和 DatasetArtifact 都有一些帮助程序函数，它们是 LineageQuery API 的包装器，可以让某些世系查询更容易利用。以下示例演示如何使用这些帮助程序函数。


# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)

例获取世系图表可视化

示例笔记本 visualizer.py 中提供了一个帮助程序类 Visualizer 来帮助绘制世系图表。呈现查询响应时，将显示一个包含 StartArns 世系关系的图表。从 StartArns 开始，可视化显示了与 query_lineage API 操作中返回的其他世系实体之间的关系。


# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")
        
        query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

手动创建实体

跟踪跨账户脉络

查询世系实体

限制

主题

查询世系实体入门

例 使用 LineageQuery API 查找实体关联

例 查找与端点关联的所有数据集

例 查找与端点关联的模型

例 查找与端点关联的试验组件

例 更改世系的焦点

例 使用 LineageQueryDirectionEnum.BOTH 查找前代和后代关系

例 LineageQuery 中的方向 - ASCENDANTS 与 DESCENDANTS

例 可简化世系查询的 SDK 帮助程序函数

例 获取世系图表可视化