查询世系实体 - Amazon SageMaker

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

查询世系实体

Amazon 会在您使用世系实体时 SageMaker 自动生成图表。您可以查询这些数据来回答各种问题。您可以查询世系实体以执行以下操作:

  • 检索创建模型时使用的所有数据集。

  • 检索创建端点时使用的所有作业。

  • 检索所有使用数据集的模型。

  • 检索所有使用模型的端点。

  • 检索哪些端点派生自特定数据集。

  • 检索创建了训练作业的管道执行。

  • 检索实体之间的关系,以便进行调查、治理和再现。

  • 检索所有使用该构件的下游试验。

  • 检索所有使用该构件的上游试验。

  • 检索使用所提供的 S3 URI 的构件列表。

  • 检索使用该数据集构件的上游构件。

  • 检索使用该数据集构件的下游构件。

  • 检索使用该映像构件的数据集。

  • 检索使用该上下文的操作。

  • 检索使用该端点的处理作业。

  • 检索使用该端点的转换作业。

  • 检索使用该端点的试验组件。

  • 检索与模型包组关联的管道执行的 ARN。

  • 检索所有使用该操作的构件。

  • 检索所有使用该模型包批准操作的上游数据集。

  • 从模型包批准操作中检索模型包。

  • 检索使用该端点的下游端点上下文。

  • 检索与试验组件关联的管道执行的 ARN。

  • 检索使用该试验组件的数据集。

  • 检索使用该试验组件的模型。

  • 探索您的世系以实现可视化。

限制
  • 以下区域不提供世系查询功能:

    • 非洲(开普敦)- af-south

    • 亚太地区(雅加达)– ap-southeast-3

    • 亚太地区(大阪)– ap-northeast-3

    • 欧洲地区(米兰)- eu-south-1

    • 欧洲(西班牙)- eu-south-2

    • 以色列(特拉维夫)– il-central-1

  • 要发现的关系的最大深度目前限制为 10。

  • 筛选仅限于以下属性:上次修改日期、创建日期、类型和世系实体类型。

查询世系实体入门

最简单的入门方式是通过:

以下示例说明如何使用 LineageQueryLineageFilter API 来构造查询,以回答有关世系图表的问题,并针对一些使用案例提取实体关系。

例 使用 LineageQuery API 查找实体关联
from sagemaker.lineage.context import Context, EndpointContext from sagemaker.lineage.action import Action from sagemaker.lineage.association import Association from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact from sagemaker.lineage.query import ( LineageQuery, LineageFilter, LineageSourceEnum, LineageEntityEnum, LineageQueryDirectionEnum, ) # Find the endpoint context and model artifact that should be used for the lineage queries. contexts = Context.list(source_uri=endpoint_arn) context_name = list(contexts)[0].context_name endpoint_context = EndpointContext.load(context_name=context_name)
例 查找与端点关联的所有数据集
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the datasets dataset_artifacts = [] for vertex in query_result.vertices: dataset_artifacts.append(vertex.to_lineage_object().source.source_uri) pp.pprint(dataset_artifacts)
例 查找与端点关联的模型
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the model model_artifacts = [] for vertex in query_result.vertices: model_artifacts.append(vertex.to_lineage_object().source.source_uri) # The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with # the S3 URI to the model.tar.gz file associated with the model pp.pprint(model_artifacts)
例 查找与端点关联的试验组件
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`. query_filter = LineageFilter( entities=[LineageEntityEnum.TRIAL_COMPONENT], sources=[LineageSourceEnum.TRAINING_JOB], ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the ARNs of the training jobs associated with this Endpoint trial_components = [] for vertex in query_result.vertices: trial_components.append(vertex.arn) pp.pprint(trial_components)
例 更改世系的焦点

可以修改 LineageQuery,使其具有不同的 start_arns,这将更改世系的焦点。此外,LineageFilter 可以采用多个来源和实体来扩大查询范围。

在下文中,我们使用模型作为世系焦点,并查找与之关联的端点和数据集。

# Get the ModelArtifact model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0] model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn) query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that descend from the model, i.e. the endpoint direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that ascend from the model, i.e. the datasets direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
例 使用 LineageQueryDirectionEnum.BOTH 查找前代和后代关系

当方向设置为 BOTH 时,查询将遍历图表以查找前代和后代关系。这种遍历不仅从起始节点开始,而且从访问的每个节点开始。如果一个训练作业运行了两次,并且该训练作业生成的两个模型都部署到端点,则方向设置为 BOTH 的查询结果会显示这两个端点。这是因为训练和部署模型时使用的是同一映像。由于该映像对模型是通用的,因此 start_arn 和两个端点都会显示在查询结果中。

query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # This specifies that the query should look for associations both ascending and descending for the start direction=LineageQueryDirectionEnum.BOTH, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
LineageQuery 中的方向 - ASCENDANTSDESCENDANTS

要了解世系图表中的方向,请使用以下实体关系图表 - 数据集 -> 训练作业 -> 模型 -> 端点

端点是模型的后代,而模型是数据集的后代。同样,模型是端点的前代。direction 参数可用于指定查询应返回 start_arns 中实体的后代实体还是前代实体。如果 start_arns 包含模型且方向为 DESCENDANTS,则查询将返回端点。如果方向为 ASCENDANTS,则查询将返回数据集。

# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[ LineageSourceEnum.ENDPOINT, LineageSourceEnum.MODEL, LineageSourceEnum.DATASET, LineageSourceEnum.TRAINING_JOB, ], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) ascendant_artifacts = [] # The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a # lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN. for vertex in query_result.vertices: try: ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: ascendant_artifacts.append(vertex.arn) print("Ascendant artifacts : ") pp.pprint(ascendant_artifacts) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) descendant_artifacts = [] for vertex in query_result.vertices: try: descendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: # Handling TrialComponents. descendant_artifacts.append(vertex.arn) print("Descendant artifacts : ") pp.pprint(descendant_artifacts)
例 可简化世系查询的 SDK 帮助程序函数

EndpointContextModelArtifactDatasetArtifact 都有一些帮助程序函数,它们是 LineageQuery API 的包装器,可以让某些世系查询更容易利用。以下示例演示如何使用这些帮助程序函数。

# Find all the datasets associated with this endpoint datasets = [] dataset_artifacts = endpoint_context.dataset_artifacts() for dataset in dataset_artifacts: datasets.append(dataset.source.source_uri) print("Datasets : ", datasets) # Find the training jobs associated with the endpoint training_job_artifacts = endpoint_context.training_job_arns() training_jobs = [] for training_job in training_job_artifacts: training_jobs.append(training_job) print("Training Jobs : ", training_jobs) # Get the ARN for the pipeline execution associated with this endpoint (if any) pipeline_executions = endpoint_context.pipeline_execution_arn() if pipeline_executions: for pipeline in pipelines_executions: print(pipeline) # Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model dataset_artifacts = model_artifact.dataset_artifacts() endpoint_contexts = model_artifact.endpoint_contexts() datasets = [dataset.source.source_uri for dataset in dataset_artifacts] endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Datasets associated with this model : ") pp.pprint(datasets) print("Endpoints associated with this model : ") pp.pprint(endpoints) # Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset # Find the artifact associated with the dataset dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn) # Find the endpoints that used this training dataset endpoint_contexts = dataset_artifact.endpoint_contexts() endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Endpoints associated with the training dataset {}".format(training_data)) pp.pprint(endpoints)
例 获取世系图表可视化

示例笔记本 visualizer.py 中提供了一个帮助程序类 Visualizer 来帮助绘制世系图表。呈现查询响应时,将显示一个包含 StartArns 世系关系的图表。从 StartArns 开始,可视化显示了与 query_lineage API 操作中返回的其他世系实体之间的关系。

# Graph APIs # Here we use the boto3 `query_lineage` API to generate the query response to plot. from visualizer import Visualizer query_response = sm_client.query_lineage( StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True ) viz = Visualizer() viz.render(query_response, "Endpoint") query_response = sm_client.query_lineage( StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True ) viz.render(query_response, "Model")