CREATE MODEL - Amazon Redshift

CREATE MODEL

The CREATE MODEL statement offers flexibility in the number of parameters used to create the model. Depending on their needs or problem type, users can choose their preferred preprocessors, algorithms, problem types, or hyperparameters.

Before you use the CREATE MODEL statement, complete the prerequisites in Cluster setup for using Amazon Redshift ML. The following is a high-level summary of the prerequisites.

  • Create an Amazon Redshift cluster with the AWS Management Console or the AWS Command Line Interface (AWS CLI).

  • Attach the AWS Identity and Access Management (IAM) policy while creating the cluster.

  • To allow Amazon Redshift and SageMaker to assume the role to interact with other services, add the appropriate trust policy to the IAM role.

For details for the IAM role, trust policy, and other prerequisites, see Cluster setup for using Amazon Redshift ML.

Following, you can find different use cases for the CREATE MODEL statement.

Simple CREATE MODEL

The following summarizes the basic options of the CREATE MODEL syntax.

Simple CREATE MODEL syntax

CREATE MODEL model_name FROM { table_name | ( select_query ) } TARGET column_name FUNCTION prediction_function_name IAM_ROLE 'iam_role_arn' SETTINGS ( S3_BUCKET 'bucket', [ MAX_CELLS integer ] )

Simple CREATE MODEL parameters

model_name

The name of the model. The model name in a schema must be unique.

FROM { table_name | ( select_query ) }

The table_name or the query that specifies the training data. They can either be an existing table in the system, or an Amazon Redshift-compatible SELECT query enclosed with parentheses, that is (). There must be at least two columns in the query result.

TARGET column_name

The name of the column that becomes the prediction target. The column must exist in the FROM clause.

FUNCTION prediction_function_name

A value that specifies the name of the Amazon Redshift machine learning function to be generated by the CREATE MODEL and used to make predictions using this model. The function is created in the same schema as the model object and can be overloaded.

Amazon Redshift machine learning supports models, such as Xtreme Gradient Boosted tree (XGBoost) models for regression and classification.

IAM_ROLE 'iam_role_arn'

The Amazon Resource Name (ARN) for an AWS Identity and Access Management (IAM) role that your cluster uses for authentication and authorization. As a minimum, the IAM role must have permission to perform a LIST operation on the Amazon S3 bucket that is used for unloading training data and staging of Amazon SageMaker artifacts. The following shows the syntax for the IAM_ROLE parameter string for a single ARN.

IAM_ROLE 'arn:aws:iam::aws-account-id:role/role-name'
S3_BUCKET 'bucket'

The name of the Amazon S3 bucket that you previously created used to share training data and artifacts between Amazon Redshift and SageMaker. Amazon Redshift creates a subfolder in this bucket prior to unload of the training data. When training is complete, Amazon Redshift deletes the created subfolder and its contents.

MAX_CELLS integer

The maximum number of cells to export from the FROM clause. The default is 1,000,000.

The number of cells is the product of the number of rows in the training data (produced by the FROM clause table or query) times the number of columns. If the number of cells in the training data are more than that specified by the max_cells parameter, CREATE MODEL downsamples the FROM clause training data to reduce the size of the training set below MAX_CELLS. Allowing larger training datasets can produce higher accuracy but also can mean the model takes longer to train and costs more.

For information about costs of using Amazon Redshift, see Costs for using Amazon Redshift ML.

For more information about costs associated with various cell numbers and free trial details, see Amazon Redshift pricing.

CREATE MODEL with user guidance

Following, you can find a description of options for CREATE MODEL in addition to the options described in Simple CREATE MODEL.

By default, CREATE MODEL searches for the best combination of preprocessing and model for your specific dataset. You might want additional control or introduce additional domain knowledge (such as problem type or objective) over your model. In a customer churn scenario, if the outcome “customer is not active” is rare, then the F1 objective is often preferred to the accuracy objective. Because high accuracy models might predict “customer is active” all the time, this results in high accuracy but little business value. For information about F1 objective, see AutoMLJobObjective in the Amazon SageMaker API Reference.

Then the CREATE MODEL follows your suggestions on the specified aspects, such as the objective. At the same time, the CREATE MODEL automatically discovers the best preprocessors and the best hyperparameters.

CREATE MODEL with user guidance syntax

CREATE MODEL offers more flexibility on the aspects that you can specify and the aspects that Amazon Redshift automatically discovers.

CREATE MODEL model_name FROM { table_name | ( select_statement ) } TARGET column_name FUNCTION function_name IAM_ROLE 'iam_role_arn' [ MODEL_TYPE { XGBOOST | MLP } ] [ PROBLEM_TYPE ( REGRESSION | BINARY_CLASSIFICATION | MULTICLASS_CLASSIFICATION ) ] [ OBJECTIVE ( 'MSE' | 'Accuracy' | 'F1' | 'F1Macro' | 'AUC') ] SETTINGS ( S3_BUCKET 'bucket', | S3_GARBAGE_COLLECT { ON | OFF }, | KMS_KEY_ID 'kms_key_id', | MAX_CELLS integer, | MAX_RUNTIME integer (, ...) )

CREATE MODEL with user guidance parameters

MODEL_TYPE { XGBOOST | MLP }

(Optional) Specifies the model type. You can specify if you want to train a model of a specific model type, such as XGBoost or multilayer perceptron (MLP). MLP is a deep learning algorithm that Amazon SageMaker Autopilot supports. If you don't specify the parameter, then all supported model types are searched during training for the best model.

PROBLEM_TYPE ( REGRESSION | BINARY_CLASSIFICATION | MULTICLASS_CLASSIFICATION )

(Optional) Specifies the problem type. If you know the problem type, you can restrict Amazon Redshift to only search of the best model of that specific model type. If you don't specify this parameter, a problem type is discovered during the training, based on your data.

OBJECTIVE ( 'MSE' | 'Accuracy' | 'F1' | 'F1Macro' | 'AUC')

(Optional) Specifies the name of the objective metric used to measure the predictive quality of a machine learning system. This metric is optimized during training to provide the best estimate for model parameter values from data. If you don't specify a metric explicitly, the default behavior is to automatically use MSE: for regression, F1: for binary classification, Accuracy: for multiclass classification. For more information about objectives, see AutoMLJobObjective in the Amazon SageMaker API Reference.

MAX_CELLS integer

(Optional) Specifies the number of cells in the training data. This value is the product of the number of records (in the training query or table) times the number of columns. The default is 1,000,000.

MAX_RUNTIME integer

(Optional) Specifies the maximum amount of time to train. Training jobs often complete sooner depending on dataset size. This specifies the maximum amount of time the training should take. The default is 5,400 (90 minutes).

S3_GARBAGE_COLLECT { ON | OFF }

(Optional) Specifies whether Amazon Redshift performs garbage collection on the resulting datasets used to train models and the models. If set to OFF, the resulting datasets used to train models and the models remains in Amazon S3 and can be used for other purposes. If set to ON, Amazon Redshift deletes the artifacts in Amazon S3 after the training completes. The default is ON.

KMS_KEY_ID 'kms_key_id'

(Optional) Specifies if Amazon Redshift uses server-side encryption with an AWS KMS key to protect data at rest. Data in transit is protected with Secure Sockets Layer (SSL).

PREPROCESSORS 'string'

(Optional) Specifies certain combinations of preprocessors to certain sets of columns. The format is a list of columnSets, and the appropriate transforms to be applied to each set of columns. Amazon Redshift applies all the transformers in a specific transformers list to all columns in the corresponding ColumnSet. For example, to apply OneHotEncoder with Imputer to columns t1 and t2, use the sample command following.

CREATE MODEL customer_churn FROM customer_data TARGET 'Churn' FUNCTION predict_churn IAM_ROLE 'iam_role' PROBLEM_TYPE BINARY_CLASSIFICATION OBJECTIVE 'F1' PREPROCESSORS '[ ... {"ColumnSet": [ "t1", "t2" ], "Transformers": [ "OneHotEncoder", "Imputer" ] }, {"ColumnSet": [ "t3" ], "Transformers": [ "OneHotEncoder" ] }, {"ColumnSet": [ "temp" ], "Transformers": [ "Imputer", "NumericPassthrough" ] } ]' SETTINGS ( S3_BUCKET 'bucket' )

Amazon Redshift supports the following transformers:

  • OneHotEncoder – Typically used to encode a discrete value into a binary vector with one nonzero value. This transformer is suitable for many machine learning models.

  • OrdinalEncoder – Encodes discrete values into a single integer. This transformer is suitable for certain machine learning models, such as MLP.

  • NumericPassthrough – Passes input as is into the model.

  • Imputer – Fills in missing values and not a number (NaN) values.

  • ImputerWithIndicator – Fills in missing values and NaN values. This transformer also creates an indicator of whether any values were missing and filled in.

  • Normalizer – Normalizes values, which can improve the performance of many machine learning algorithms.

  • DateTimeVectorizer – Creates a vector embedding, representing a column of datetime data type that can be used in machine learning models.

  • PCA – Projects the data into a lower dimensional space to reduce the number of features while keeping as much information as possible.

Amazon Redshift ML stores the trained transformers, and automatically applies them as part of the prediction query. You don't need to specify them when generating predictions from your model.

CREATE XGBoost models with AUTO OFF

The AUTO OFF CREATE MODEL has generally different objectives from the default CREATE MODEL.

As an advanced user who already knows the model type that you want and hyperparameters to use when training these models, you can use CREATE MODEL with AUTO OFF to turn off the CREATE MODEL automatic discovery of preprocessors and hyperparameters. To do so, you explicitly specify the model type. XGBoost is currently the only model type supported when AUTO is set to OFF. You can specify hyperparameters. Amazon Redshift uses default values for any hyperparameters that you specified.

CREATE MODEL with AUTO OFF syntax

CREATE MODEL model_name FROM { table_name | (select_statement ) } TARGET column_name FUNCTION function_name IAM_ROLE 'iam_role_arn' AUTO OFF MODEL_TYPE XGBOOST OBJECTIVE { 'reg:squarederror' | 'reg:squaredlogerror' | 'reg:logistic' | 'reg:pseudohubererror' | 'reg:tweedie' | 'binary:logistic' | 'binary:hinge' | 'multi:softmax' | 'rank:pairwise' | 'rank:ndcg' } HYPERPARAMETERS DEFAULT EXCEPT ( NUM_ROUND '10', ETA '0.2', NUM_CLASS '10', (, ...) ) PREPROCESSORS 'none' SETTINGS ( S3_BUCKET 'bucket', | S3_GARBAGE_COLLECT { ON | OFF }, | KMS_KEY_ID 'kms_key_id', | MAX_CELLS integer, | MAX_RUNTIME integer (, ...) )

CREATE XGBoost models with AUTO OFF parameters

AUTO OFF

Turns off CREATE MODEL automatic discovery of preprocessor, algorithm, and hyper-parameters selection.

MODEL_TYPE XGBOOST

Specifies to use XGBOOST to train the model.

OBJECTIVE str

Specifies an objective recognized by the algorithm. Amazon Redshift supports reg:squarederror, reg:squaredlogerror, reg:logistic, reg:pseudohubererror, reg:tweedie, binary:logistic, binary:hinge, multi:softmax. For more information about these objectives, see Learning task parameters in the XGBoost documentation.

HYPERPARAMETERS { DEFAULT | DEFAULT EXCEPT ( key ‘value’ (,..) ) }

Specifies whether the default XGBoost parameters are used or overridden by user-specified values. The values must be enclosed with single quotes. Following are examples of parameters for XGBoost and their defaults.

Parameter name Parameter value Default value Notes

num_class

Integer

Required for Multiclass classification.

N/A

num_round

Integer

100

N/A

tree_method

String Auto N/A

max_depth

Integer 6 [0 , 10]
min_child_weight Float 1 MinValue: 0, MaxValue: 120
subsample Float 1 MinValue: 0.5, MaxValue: 1
gamma Float 0 MinValue: 0, MaxValue: 5
alpha Float 0 MinValue: 0, MaxValue: 1000
eta Float 0.3 MinValue: 0.1, MaxValue: 0.5
colsample_byleve Float 1 MinValue: 0.1, MaxValue: 1
colsample_bynode Float 1 MinValue: 0.1, MaxValue: 1
colsample_bytree Float 1 MinValue: 0.5, MaxValue: 1
lambda Float 1 MinValue: 0, MaxValue: 1000
max_delta_step Integer 0 [0, 10]

The following example prepares data for XGBoost.

DROP TABLE IF EXISTS abalone_xgb; CREATE TABLE abalone_xgb ( length_val float, diameter float, height float, whole_weight float, shucked_weight float, viscera_weight float, shell_weight float, rings int, record_number int); COPY abalone_xgb FROM 's3://redshift-downloads/redshift-ml/abalone_xg/' REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::467896856988:role/Redshift-ML' IGNOREHEADER 1 CSV;

The following example creates an XGBoost model with specified advanced options, such as MODEL_TYPE, OBJECTIVE, and PREPROCESSORS.

DROP MODEL abalone_xgboost_multi_predict_age; CREATE MODEL abalone_xgboost_multi_predict_age FROM ( SELECT length_val, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight, rings FROM abalone_xgb WHERE record_number < 2500 ) TARGET rings FUNCTION ml_fn_abalone_xgboost_multi_predict_age IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/Redshift-ML' AUTO OFF MODEL_TYPE XGBOOST OBJECTIVE 'multi:softmax' PREPROCESSORS 'none' HYPERPARAMETERS DEFAULT EXCEPT (NUM_ROUND '100', NUM_CLASS '30') SETTINGS (S3_BUCKET 'your-bucket');

The following example uses an inference query to predict the age of the fish with a record number greater than 200. It uses the function ml_fn_abalone_xgboost_multi_predict_age created from the above command.

select ml_fn_abalone_xgboost_multi_predict_age(length_val, diameter, height, whole_weight, shucked_weight, viscera_weight, shell_weight)+1.5 as age from abalone_xgb where record_number > 2500;

Bring your own model (BYOM)

Amazon Redshift ML supports using bring your own model (BYOM) for local or remote inference.

The following summarizes the options for the CREATE MODEL syntax for BYOM. You can use a model trained outside of Amazon Redshift with Amazon SageMaker for in-database inference locally in Amazon Redshift. Amazon Redshift ML supports using BYOM in either local or remote inference.

CREATE MODEL syntax for local inference

The following describes the CREATE MODEL syntax for local inference.

CREATE MODEL model_name FROM ('job_name' | 's3_path' ) FUNCTION function_name ( data_type [, ...] ) RETURNS data_type IAM_ROLE 'iam-role-arn' [ SETTINGS ( S3_BUCKET 'bucket', | --required KMS_KEY_ID 'kms_string') --optional ];

Amazon Redshift currently only supports pretrained XGBoost and MLP models for BYOM. You can import SageMaker Autopilot and models directly trained in Amazon SageMaker for local inference using this path.

CREATE MODEL parameters for local inference

model_name

The name of the model. The model name in a schema must be unique.

FROM ('job_name' | 's3_path' )

The job_name uses an Amazon SageMaker job name as the input. The job name can either be an Amazon SageMaker training job name or an Amazon SageMaker Autopilot job name. The job must be created in the same AWS account that owns the Amazon Redshift cluster.

The 's3_path' specifies the S3 location of the .tar.gz model artifacts file that is to be used when creating the model.

FUNCTION function_name ( data_type [, ...] )

The name of the function to be created and the data types of the input arguments. You can provide a schema name.

RETURNS data_type

The data type of the value returned by the function.

IAM_ROLE 'iam_role_arn'

The Amazon Resource Name (ARN) for an AWS Identity and Access Management (IAM) role that your cluster uses for authentication and authorization.

The following shows the syntax for the IAM_ROLE parameter string for a single ARN.

IAM_ROLE 'arn:aws:iam::aws-account-id:role/role-name'
SETTINGS ( S3_BUCKET 'bucket', | KMS_KEY_ID 'kms_string')

The S3_BUCKET clause specifies the Amazon S3 location that is used to store intermediate results.

(Optional) The KMS_KEY_ID clause specifies if Amazon Redshift uses server-side encryption with an AWS KMS key to protect data at rest. Data in transit is protected with Secure Sockets Layer (SSL).

For more information, see CREATE MODEL with user guidance.

CREATE MODEL for local inference example

The following example creates a model that has been previously trained in Amazon SageMaker, outside of Amazon Redshift. Because the model type is supported by Amazon Redshift ML for local inference, the following CREATE MODEL creates a function that can be used locally in Amazon Redshift. You can provide a SageMaker training job name.

CREATE MODEL customer_churn FROM 'training-job-customer-churn-v4' FUNCTION customer_churn_predict (varchar, int, float, float) RETURNS int IAM_ROLE 'arn:aws:iam::123456789012:role/Redshift-ML' SETTINGS (S3_BUCKET 'your-bucket');

After the model is created, you can use the function customer_churn_predict with the specified argument types to make predictions.

CREATE MODEL syntax for remote inference

The following describes the CREATE MODEL syntax for remote inference.

CREATE MODEL model_name FUNCTION function_name ( data_type [, ...] ) RETURNS data_type SAGEMAKER 'endpoint_name'[:'model_name'] IAM_ROLE 'iam-role-arn';

CREATE MODEL parameters for remote inference

model_name

The name of the model. The model name in a schema must be unique.

FUNCTION fn_name ( [data_type] [, ...] )

The name of the function and the data types of the input arguments. You can provide a schema name.

RETURNS data_type

The data type of the value returned by the function.

SAGEMAKER 'endpoint_name'[:'model_name']

The name of the Amazon SageMaker endpoint. If the endpoint name points to a multimodel endpoint, add the name of the model to use. The endpoint must be hosted in the same AWS Region as the Amazon Redshift cluster.

IAM_ROLE 'iam_role_arn'

The Amazon Resource Name (ARN) for an AWS Identity and Access Management (IAM) role that your cluster uses for authentication and authorization. At a minimum, the IAM role must have access to Amazon SageMaker to access the endpoint that stages any Amazon SageMaker artifacts.

The following shows the syntax for the IAM_ROLE parameter string for a single ARN.

IAM_ROLE 'arn:aws:iam::aws-account-id:role/role-name'

When the model is deployed to a SageMaker endpoint, SageMaker creates the information of the model in Amazon Redshift. It then performs inference through the external function. You can use the SHOW MODEL command to view the model information on your Amazon Redshift cluster.

CREATE MODEL for remote inference usage notes

Before using CREATE MODEL for remote inference, consider the following:

  • The model must accept inputs in the format of comma-separated values (CSV) through a content type of text or CSV in SageMaker.

  • The endpoint must be hosted by the same AWS account that owns the Amazon Redshift cluster.

  • The outputs of models must be a single value of the type specified on creating the function, in the format of comma-separated values (CSV) through a content type of text or CSV in SageMaker.

  • Models accept nulls as empty strings.

  • Make sure either that the Amazon SageMaker endpoint has enough resources to accommodate inference calls from Amazon Redshift or that the Amazon SageMaker endpoint can be automatically scaled.

CREATE MODEL for remote inference example

The following example creates a model that uses a SageMaker endpoint to make predictions. Make sure that the endpoint is running to make predictions and specify its name in the CREATE MODEL command.

CREATE MODEL remote_customer_churn FUNCTION remote_fn_customer_churn_predict (varchar, int, float, float) RETURNS int SAGEMAKER 'customer-churn-endpoint' IAM_ROLE 'arn:aws:iam::0123456789012:role/Redshift-ML';

Full CREATE MODEL

The following summarizes the basic options of the full CREATE MODEL syntax.

Full CREATE MODEL syntax

The following is the full syntax of the CREATE MODEL statement. This syntax is used when the AUTO ON semiautomatic CREATE MODEL with user guidance and the AUTO OFF CREATE XGBoost models with AUTO OFF work together. This syntax also includes the CREATE MODEL statement for BYOM.

Important

When creating a model using the CREATE MODEL statement, follow the order of the keywords in the syntax following.

CREATE MODEL model_name FROM { table_name | ( select_statement ) | 'job_name' } [ TARGET column_name ] FUNCTION function_name ( data_type [, ...] ) IAM_ROLE 'iam_role_arn' [ AUTO ON / OFF ] -- default is AUTO ON [ MODEL_TYPE { XGBOOST | MLP } ] -- not required for non AUTO OFF case, default is the list of all supported types -- required for AUTO OFF [ PROBLEM_TYPE ( REGRESSION | BINARY_CLASSIFICATION | MULTICLASS_CLASSIFICATION ) ] -- not supported when AUTO OFF [ OBJECTIVE ( 'MSE' | 'Accuracy' | 'F1' | 'F1_Macro' | 'AUC' | 'reg:squarederror' | 'reg:squaredlogerror'| 'reg:logistic'| 'reg:pseudohubererror' | 'reg:tweedie' | 'binary:logistic' | 'binary:hinge', 'multi:softmax' ) ] -- for AUTO ON: first 5 are valid -- for AUTO OFF: 6-13 are valid [ PREPROCESSORS 'string' ] -- required for AUTO OFF, when it has to be 'none' -- optional for AUTO ON [ HYPERPARAMETERS { DEFAULT | DEFAULT EXCEPT ( Key 'value' (,...) ) } ] -- support XGBoost hyperparameters, except OBJECTIVE -- required and only allowed for AUTO OFF -- default NUM_ROUND is 100 -- NUM_CLASS is required if objective is multi:softmax (only possible for AUTO OFF) [ SETTINGS ( S3_BUCKET 'bucket', | -- required KMS_KEY_ID 'kms_string', | -- optional S3_GARBAGE_COLLECT on / off, | -- optional, defualt is on. MAX_CELLS integer, | -- optional, default is 1,000,000 MAX_RUNTIME integer (, ...) -- optional, default is 5400 (1.5 hours) ) ]

Usage notes

When using CREATE MODEL, consider the following:

  • The CREATE MODEL statement operates in an asynchronous mode and returns upon the export of training data to Amazon S3. The remaining steps of training in Amazon SageMaker occur in the background. While training is in progress, the corresponding inference function is visible but can't be run. You can query STV_ML_MODEL_INFO to see the state of training.

  • The training can run for up to 90 minutes in the background, by default in the Auto model and can be extended. To cancel the training, simply run the DROP MODEL command.

  • The Amazon Redshift cluster that you use to create the model and the Amazon S3 bucket that is used to stage the training data and model artifacts must be in the same AWS Region.

  • During the model training, Amazon Redshift and SageMaker store intermediate artifacts in the Amazon S3 bucket that you provide. By default, Amazon Redshift performs garbage collection at the end of the CREATE MODEL operaton. Amazon Redshift removes those objects from Amazon S3. To retain those artifacts on Amazon S3, set the S3_GARBAGE COLLECT OFF option.

  • You must use at least 500 rows in the training data provided in the FROM clause.

  • You can only specify up to 256 feature (input) columns in the FROM { table_name | ( select_query ) } clause when using the CREATE MODEL statement.

  • For AUTO ON, the column types that you can use as the training set are SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, BOOLEAN, CHAR, VARCHAR, DATE, TIME, TIMETZ, TIMESTAMP, and TIMESTAMPTZ. For AUTO OFF, the column types that you can use as the training set are SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, and BOOLEAN.

  • You can't use DECIMAL, DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, GEOMETRY, HLLSKETCH, or SUPER as the target column type.

  • To improve model accuracy, do one of the following:

    • Add as many relevant columns in the CREATE MODEL command as possible when you specify the training data in the FROM clause.

    • Use a larger value for MAX_RUNTIME and MAX_CELLS. Larger values for this parameter increase the cost of training a model.

  • The CREATE MODEL statement execution returns as soon as the training data is computed and exported to the Amazon S3 bucket. After that point, you can check the status of the training using the SHOW MODEL command. When a model being trained in the background fails, you can check the error using SHOW MODEL. You can't retry a failed model. Use DROP MODEL to remove a failed model and recreate a new model. For more information about SHOW MODEL, see SHOW MODEL.

  • Local BYOM supports the same kind of models that Amazon Redshift ML supports for non-BYOM cases. Amazon Redshift supports plain XGBoost or MLP models (using XGBoost version 1.0 or later) without preprocessors and XGBoost models trained by Amazon SageMaker Autopilot. It supports the latter with preprocessors that Autopilot has specified that are also supported by Amazon SageMaker Neo.

  • If your Amazon Redshift cluster has enhanced routing enabled for your virtual private cloud (VPC), make sure to create an Amazon S3 VPC endpoint and an SageMaker VPC endpoint for the VPC that your cluster is in. Doing this enables the traffic to run through your VPC between these services during CREATE MODEL. For more information, see SageMaker Clarify Job Amazon VPC Subnets and Security Groups.