查詢歷程實體 - Amazon SageMaker

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

查詢歷程實體

Amazon 會在您使用歷程實體時 SageMaker 自動產生歷程實體的圖形。您可以查詢此資料以回答各種問題。您可以查詢歷程實體,以執行下列作業:

  • 擷取建立模型時使用的所有資料集。

  • 擷取建立端點時使用的所有工作。

  • 擷取使用資料集的所有模型。

  • 擷取使用模型的所有端點。

  • 擷取從特定資料集衍生的端點。

  • 擷取建立訓練工作的管道執行。

  • 擷取實體之間的關係,以進行調查、治理和再現。

  • 擷取使用成品的所有下游試用。

  • 擷取所有使用成品的上游試用。

  • 擷取使用所提供之 S3 URI 的成品清單。

  • 擷取使用資料集成品的上游成品。

  • 擷取使用資料集成品的下游成品。

  • 擷取使用映像成品的資料集。

  • 擷取使用內容的動作。

  • 擷取使用端點的處理工作。

  • 擷取使用端點的轉換工作。

  • 擷取使用端點的試用元件。

  • 擷取與模型套件群組相關聯之管道執行的 ARN。

  • 擷取使用動作的所有成品。

  • 擷取使用模型套件核准動作的所有上游資料集。

  • 透過模型套件核准動作擷取模型套件。

  • 擷取使用端點的下游端點內容。

  • 擷取與試用元件相關聯之管道執行的 ARN。

  • 擷取使用試用元件的資料集。

  • 擷取使用試用元件的模型。

  • 探索歷程以進行視覺化。

限制
  • 下列區域無法使用歷程查詢:

    • 非洲 (開普敦) – af-south

    • 亞太區域 (雅加達) – ap-southeast-3

    • 亞太區域 (大阪) - (ap-northeast-3)

    • 歐洲 (米蘭) – eu-south-1

    • 歐洲 (西班牙) eu-south-2

    • 以色列 (特拉維夫) – il-central-1

  • 目前,關係探索的最大深度限制為 10。

  • 篩選僅限於下列屬性:上次修改日期、建立日期、類型和歷程實體類型。

開始查詢歷程實體

開始查詢歷程實體的最簡單方式是:

下列範例展示如何使用 LineageQueryLineageFilter API 建構查詢,以回答有關歷程圖的問題,並擷取一些使用案例中的實體關聯。

範例 使用 LineageQuery API 尋找實體關聯
from sagemaker.lineage.context import Context, EndpointContext from sagemaker.lineage.action import Action from sagemaker.lineage.association import Association from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact from sagemaker.lineage.query import ( LineageQuery, LineageFilter, LineageSourceEnum, LineageEntityEnum, LineageQueryDirectionEnum, ) # Find the endpoint context and model artifact that should be used for the lineage queries. contexts = Context.list(source_uri=endpoint_arn) context_name = list(contexts)[0].context_name endpoint_context = EndpointContext.load(context_name=context_name)
範例 尋找與某個端點相關聯的所有資料集
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the datasets dataset_artifacts = [] for vertex in query_result.vertices: dataset_artifacts.append(vertex.to_lineage_object().source.source_uri) pp.pprint(dataset_artifacts)
範例 尋找與某個端點相關聯的模型
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL] ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the lineage objects corresponding to the model model_artifacts = [] for vertex in query_result.vertices: model_artifacts.append(vertex.to_lineage_object().source.source_uri) # The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with # the S3 URI to the model.tar.gz file associated with the model pp.pprint(model_artifacts)
範例 尋找與端點相關聯的試用元件
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`. query_filter = LineageFilter( entities=[LineageEntityEnum.TRIAL_COMPONENT], sources=[LineageSourceEnum.TRAINING_JOB], ) # Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context` # and find all datasets. query_result = LineageQuery(sagemaker_session).query( start_arns=[endpoint_context.context_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) # Parse through the query results to get the ARNs of the training jobs associated with this Endpoint trial_components = [] for vertex in query_result.vertices: trial_components.append(vertex.arn) pp.pprint(trial_components)
範例 變更歷程的焦點

LineageQuery 可以修改為具有不同的 start_arns 來變更歷程的焦點。此外,LineageFilter 可以採用多個來源和實體來擴充查詢的範圍。

我們在下面使用該模型作為歷程焦點,並找到與之相關聯的端點和資料集。

# Get the ModelArtifact model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0] model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn) query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that descend from the model, i.e. the endpoint direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # Find all the entities that ascend from the model, i.e. the datasets direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
範例 是用 LineageQueryDirectionEnum.BOTH 尋找遞增與遞減關係

當方向設定為 BOTH 時,查詢會遍歷圖形,以尋找遞增和遞減關係。這種遍歷不僅在起始節點發生,還會在造訪的每個節點進行。例如,如果某個訓練工作執行兩次,而且訓練工作產生的兩個模型均部署到端點,則查詢結果的方向會設定為 BOTH,以顯示兩個端點。這是因為模型訓練和部署是用了相同的映像。由於模型映像是相同的,因此 start_arn 和兩個端點都會出現在查詢結果中。

query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], # Model is the starting artifact query_filter=query_filter, # This specifies that the query should look for associations both ascending and descending for the start direction=LineageQueryDirectionEnum.BOTH, include_edges=False, ) associations = [] for vertex in query_result.vertices: associations.append(vertex.to_lineage_object().source.source_uri) pp.pprint(associations)
範例 LineageQuery 中的方向 - ASCENDANTSDESCENDANTS

要了解在歷程圖中的方向,可採取以下實體關係圖:資料集-> 訓練工作 -> 模型-> 端點

從模型到端點是遞減,從模型到資料集也是遞減。與此類似,從端點到模型是遞增。direction 參數可用來指定查詢應傳回 start_arns 中實體的遞減還是遞增實體。如果 start_arns 包含模型且方向為 DESCENDANTS,則查詢會傳回端點。如果方向為 ASCENDANTS,則查詢會傳回資料集。

# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`. query_filter = LineageFilter( entities=[LineageEntityEnum.ARTIFACT], sources=[ LineageSourceEnum.ENDPOINT, LineageSourceEnum.MODEL, LineageSourceEnum.DATASET, LineageSourceEnum.TRAINING_JOB, ], ) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.ASCENDANTS, include_edges=False, ) ascendant_artifacts = [] # The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a # lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN. for vertex in query_result.vertices: try: ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: ascendant_artifacts.append(vertex.arn) print("Ascendant artifacts : ") pp.pprint(ascendant_artifacts) query_result = LineageQuery(sagemaker_session).query( start_arns=[model_artifact.artifact_arn], query_filter=query_filter, direction=LineageQueryDirectionEnum.DESCENDANTS, include_edges=False, ) descendant_artifacts = [] for vertex in query_result.vertices: try: descendant_artifacts.append(vertex.to_lineage_object().source.source_uri) except: # Handling TrialComponents. descendant_artifacts.append(vertex.arn) print("Descendant artifacts : ") pp.pprint(descendant_artifacts)
範例 SDK 輔助函式讓歷程查詢變得更輕鬆

EndpointContextModelArtifactDatasetArtifact 類別都具有輔助函式,這些函式是 LineageQuery API 上的包裝函式,可以讓某些歷程查詢變得更輕鬆。以下範例展示如何使用這些輔助函式。

# Find all the datasets associated with this endpoint datasets = [] dataset_artifacts = endpoint_context.dataset_artifacts() for dataset in dataset_artifacts: datasets.append(dataset.source.source_uri) print("Datasets : ", datasets) # Find the training jobs associated with the endpoint training_job_artifacts = endpoint_context.training_job_arns() training_jobs = [] for training_job in training_job_artifacts: training_jobs.append(training_job) print("Training Jobs : ", training_jobs) # Get the ARN for the pipeline execution associated with this endpoint (if any) pipeline_executions = endpoint_context.pipeline_execution_arn() if pipeline_executions: for pipeline in pipelines_executions: print(pipeline) # Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model dataset_artifacts = model_artifact.dataset_artifacts() endpoint_contexts = model_artifact.endpoint_contexts() datasets = [dataset.source.source_uri for dataset in dataset_artifacts] endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Datasets associated with this model : ") pp.pprint(datasets) print("Endpoints associated with this model : ") pp.pprint(endpoints) # Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset # Find the artifact associated with the dataset dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn) # Find the endpoints that used this training dataset endpoint_contexts = dataset_artifact.endpoint_contexts() endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts] print("Endpoints associated with the training dataset {}".format(training_data)) pp.pprint(endpoints)
範例 取得歷程圖視覺化圖形

範例筆記本 visualizer.py 中提供了一個輔助函式類別 Visualizer,能夠幫助歷程圖出圖。彩現查詢回應時,系統會顯示含有來自 StartArns 之歷程關係的圖形。從StartArns 開始,此視覺化圖形會顯示與 query_lineage API 動作中傳回之其他歷程實體之間的關係。

# Graph APIs # Here we use the boto3 `query_lineage` API to generate the query response to plot. from visualizer import Visualizer query_response = sm_client.query_lineage( StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True ) viz = Visualizer() viz.render(query_response, "Endpoint") query_response = sm_client.query_lineage( StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True ) viz.render(query_response, "Model")