SageMaker Pipelines Overview - Amazon SageMaker

SageMaker Pipelines Overview

An Amazon SageMaker Model Building Pipelines pipeline is a series of interconnected steps that are defined using the Pipelines SDK. You can also build your pipeline without the SDK using the pipeline definition JSON schema. This pipeline definition encodes a pipeline using a directed acyclic graph (DAG) that can be exported as a JSON definition. This DAG gives information on the requirements for and relationships between each step of your pipeline. The structure of a pipeline's DAG is determined by the data dependencies between steps. These data dependencies are created when the properties of a step's output are passed as the input to another step. The following image is an example of a pipeline DAG:

An example pipeline directed acyclic graph (DAG).
The example DAG includes the following steps:
  1. AbaloneProcess, an instance of the Processing step, runs a preprocessing script on the data used for training. For example, the script could fill in missing values, normalize numerical data, or split data into the train, validation, and test datasets.

  2. AbaloneTrain, an instance of the Training step, configures hyperparameters and trains a model from the preprocessed input data.

  3. AbaloneEval, another instance of the Processing step, evaluates the model for accuracy. This step shows an example of a data dependency—this step uses the test dataset output of the AbaloneProcess.

  4. AbaloneMSECond is an instance of a Condition step which, in this example, checks to make sure the mean-square-error result of model evaluation is below a certain limit. If the model does not meet the criteria, the pipeline run stops.

  5. The pipeline run proceeds with the following steps:

    1. AbaloneRegisterModel, where SageMaker calls a RegisterModel step to register the model as a versioned model package group into the Amazon SageMaker Model Registry.

    2. AbaloneCreateModel, where SageMaker calls a CreateModel step to create the model in preparation for batch transform. In AbaloneTransform, SageMaker calls a Transform step to generate model predictions on a dataset you specify.

The following topics describe fundamental SageMaker Pipelines concepts. For a tutorial describing the implementation of these concepts, see Create and Manage SageMaker Pipelines.