Selective execution of pipeline steps
As you use SageMaker Pipelines to create workflows and orchestrate your ML training steps, you might need to undertake multiple experimentation phases. Instead of running the entire pipeline from start to finish, you might only want to iterate over particular steps. SageMaker Pipelines supports selective execution of pipeline steps to help you optimize your ML training. Selective execution is useful in the following scenarios:
You want to restart a specific step with updated instance type, hyperparameters, or other variables while keeping the parameters from upstream steps.
Your pipeline fails an intermediate step. Previous steps in the execution, such as data preparation or feature extraction, are expensive to rerun. You might need to introduce a fix and rerun certain steps manually to complete the pipeline.
Using selective execution, you can choose to run any subset of steps as long as they are connected in the directed acyclic graph (DAG) of your pipeline. The following DAG shows an example pipeline workflow:

You can select steps AbaloneTrain
and AbaloneEval
in a selective execution,
but you cannot select just AbaloneTrain
and AbaloneMSECond
steps to run a selective
execution because these steps are not connected in the DAG. For non-selected steps in the workflow,
the selective execution reuses the outputs from a reference pipeline execution rather than recomputing the steps.
Also, non-selected steps that are downstream from the selected steps do not run in a selective execution.
If you choose to run a subset of intermediate steps in your pipeline, your steps may have dependencies on
upstream steps. SageMaker needs a reference pipeline execution from which to resource these dependencies. For
example, if you choose to run the steps AbaloneTrain
and AbaloneEval
, you need the output collaterals for the
AbaloneProcess
step from a reference pipeline execution. You can either provide a reference execution ARN or
direct SageMaker to use the latest pipeline execution, which is the default behavior. If you have a reference
execution, you can also build the runtime parameters from your reference run and supply them to your selective
executive run with any overrides. For details, see Reuse runtime parameter values from a reference execution.
In detail, you specify a configuration for your selective execution pipeline run using SelectiveExecutionConfig
.
If you specify an ARN for a reference pipeline execution (with the source_pipeline_execution_arn
argument), SageMaker
uses the upstream step dependencies from the specified pipeline execution. If you do not specify an ARN and a latest pipeline
execution exists, SageMaker uses the latest pipeline execution as a reference by default. If you do not specify an ARN and
do not want SageMaker to use your latest pipeline execution, set reference_latest_execution
to False
.
The pipeline execution which SageMaker ultimately uses as a reference, whether the latest or user-specified, must be in
Success
or Failed
state.
The following table summarizes how SageMaker chooses a reference execution based on your arguments to
SelectiveExecutionConfig
.
The source_pipeline_execution_arn argument value |
The reference_latest_execution argument value |
The reference execution used |
---|---|---|
A pipeline ARN |
|
The specified pipeline ARN |
A pipeline ARN |
|
The specified pipeline ARN |
null or unspecified |
|
The latest pipeline execution |
null or unspecified |
|
None—in this case, select steps without upstream dependencies |
For more information about selective execution configuration requirements, see the
sagemaker.workflow.selective_execution_config.SelectiveExecutionConfig
The following discussion includes examples for the cases in which you want to specify a pipeline reference execution, use the latest pipeline execution as a reference, or run selective execution without a reference pipeline execution.
Selective execution with a user-specified pipeline reference
The following example demonstrates the use of selective execution to rerun AbaloneTrain
and AbaloneEval
in the same pipeline rerun using a reference pipeline execution.
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig selective_execution_config = SelectiveExecutionConfig( source_pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef", selected_steps=["AbaloneTrain", "AbaloneEval"] ) selective_execution = pipeline.start( execution_display_name=f"Sample-Selective-Execution-1", parameters={"MaxDepth":6, "NumRound":60}, selective_execution_config=selective_execution_config, )
Selective execution with the latest pipeline execution as a reference
The following example demonstrates the use of selective execution to rerun AbaloneTrain
and
AbaloneEval
in the same pipeline rerun using the latest pipeline execution as a reference. Since
SageMaker uses the latest pipeline execution by default, you can optionally set the reference_latest_execution
argument to True
.
# Prepare a new selective execution. Select only the first step in the pipeline without providing source_pipeline_execution_arn. selective_execution_config = SelectiveExecutionConfig( selected_steps=["AbaloneTrain", "AbaloneEval"], # optional reference_latest_execution=True ) # Start pipeline execution without source_pipeline_execution_arn pipeline.start( execution_display_name=f"Sample-Selective-Execution-1", parameters={"MaxDepth":6, "NumRound":60}, selective_execution_config=selective_execution_config, )
Selective execution without a reference pipeline
The following example demonstrates the use of selective execution to rerun AbaloneProcess
and AbaloneTrain
in the same pipeline rerun without providing a reference ARN and disallowing the
use of the latest pipeline run as a reference. SageMaker allows this configuration since this subset of steps doesn’t
have upstream dependencies.
# Prepare a new selective execution. Select only the first step in the pipeline without providing source_pipeline_execution_arn. selective_execution_config = SelectiveExecutionConfig( selected_steps=["AbaloneProcess", "AbaloneTrain"], reference_latest_execution=False ) # Start pipeline execution without source_pipeline_execution_arn pipeline.start( execution_display_name=f"Sample-Selective-Execution-1", parameters={"MaxDepth":6, "NumRound":60}, selective_execution_config=selective_execution_config, )
Reuse runtime parameter values from a reference execution
You can build the parameters from your reference pipeline execution using build_parameters_from_execution
,
and supply the result to your selective execution pipeline. You can use the original parameters from the reference
execution, or apply any overrides using the parameter_value_overrides
argument.
The following example shows you how to build parameters from a reference execution and apply an override for the
MseThreshold
parameter.
# Prepare a new selective execution. selective_execution_config = SelectiveExecutionConfig( source_pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef", selected_steps=["AbaloneTrain", "AbaloneEval", "AbaloneMSECond"], ) # Define a new parameters list to test. new_parameters_mse={ "MseThreshold": 5, } # Build parameters from reference execution and override with new parameters to test. new_parameters = pipeline.build_parameters_from_execution( pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef", parameter_value_overrides=new_parameters_mse ) # Start pipeline execution with new parameters. execution = pipeline.start( selective_execution_config=selective_execution_config, parameters=new_parameters )