Track the Lineage of a SageMaker ML Pipeline - Amazon SageMaker

Track the Lineage of a SageMaker ML Pipeline

In this tutorial, you use Amazon SageMaker Studio to track the lineage of an Amazon SageMaker ML Pipeline.

The pipeline was created by the Orchestrating Jobs with Amazon SageMaker Model Building Pipelines notebook in the Amazon SageMaker example GitHub repository. For detailed information on how the pipeline was created, see Define a Pipeline.

Lineage tracking in Studio is centered around a directed acyclic graph (DAG). The DAG represents the steps in a pipeline. From the DAG you can track the lineage from any step to any other step. The following diagram displays the steps in the pipeline. These steps appear as a DAG in Studio.

Prerequisites

To track the lineage of a pipeline

  1. Sign in to SageMaker Studio.

  2. In the left sidebar of Studio, choose the SageMaker Components and registries icon ( ).

  3. In the drop-down menu, select Pipelines.

  4. Use the Search box to filter the pipelines list. To view all available columns, drag the right border of the pane to the right. For more information, see Search Experiments Using Amazon SageMaker Studio.

    The following screenshot shows the list filtered by a name that starts with "aba" and that was created on 12/5/20.

  5. Double-click the AbalonePipeline pipeline to view the execution list and other details about the pipeline. The following screenshot shows the TABLE PROPERTIES pane open where you can choose which properties to view.

  6. Choose the Settings tab and then choose Download pipeline definition file. You can view the file to see how the pipeline graph was defined.

  7. On the Execution tab, double-click the first row in the execution list to view its execution graph and other details about the execution. Note that the graph matches the diagram displayed at the beginning of the tutorial.

    You can drag the graph around (select an area not on the graph itself) or use the resizing icons on the lower-left side of the graph. The inset on the lower-right side of the graph displays your location in the graph.

  8. On the Graph tab, choose the AbaloneProcess step to view details about the step.

  9. Find the Amazon S3 paths to the training, validation, and test datasets in the Output tab, under Files.

    Note

    To get the full paths, right-click the path and then choose Copy cell contents.

    s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/train s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/validation s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/test
  10. Choose the AbaloneTrain step.

  11. Find the Amazon S3 path to the model artifact in the Output tab, under Files:

    s3://sagemaker-eu-west-1-acct-id/AbaloneTrain/pipelines-6locnsqz4bfu-AbaloneTrain-NtfEpI0Ahu/output/model.tar.gz
  12. Choose the AbaloneRegisterModel step.

  13. Find the ARN of the model package in the Output tab, under Files:

    arn:aws:sagemaker:eu-west-1:acct-id:model-package/abalonemodelpackagegroupname/2