Amazon SageMaker Model Building Pipelines - Amazon SageMaker

Amazon SageMaker Model Building Pipelines

Amazon SageMaker Model Building Pipelines is a tool for building machine learning pipelines that take advantage of direct SageMaker integration. With this integration, you can create a pipeline and set up SageMaker Projects for orchestration. This setup uses a tool that handles much of the step creation and management. You can build the pipeline using the SageMaker Python SDK, or you can author the pipeline using the SageMaker Pipeline Definition JSON Schema.

SageMaker Pipelines provides the following advantages over other AWS workflow offerings:

SageMaker Integration

SageMaker Pipelines is integrated directly with SageMaker, so you don't need to interact with any other AWS services. You also don't need to manage any resources because SageMaker Pipelines is a fully managed service. This means that SageMaker Pipelines creates and manages resources for you.

SageMaker Python SDK Integration

Because SageMaker Pipelines is integrated with the SageMaker Python SDK, you can create your pipelines programmatically using a high-level Python interface. To view the SageMaker Python SDK API reference, see Pipelines. For SageMaker Python SDK code examples, see Amazon SageMaker Model Building Pipelines.

SageMaker Studio Integration

SageMaker Studio offers an environment to manage the end-to-end SageMaker Pipelines experience. Using Studio, you can bypass the AWS console for your entire workflow management. For more information about managing SageMaker Pipelines from SageMaker Studio, see View, Track, and Execute SageMaker Pipelines in SageMaker Studio.

Data Lineage Tracking

With SageMaker Pipelines you can track the history of your data within the pipeline execution. Amazon SageMaker ML Lineage Tracking lets you analyze:

  • where the data came from

  • where the data was used as an input

  • the outputs that were generated from the data

For example, you can view the models created from an individual dataset, and view the datasets that went into creating an individual model. For more information, see Amazon SageMaker ML Lineage Tracking.

Step Reuse

With SageMaker Pipelines, you can designate steps for caching. When a step is cached, it is indexed for reuse later if the same step is run again. You can then reuse the output from previous step runs of the same step in the same pipeline without having to run the step again. For more information on step caching, see Caching Pipeline Steps.