Machine learning for novices and experts - Amazon Redshift

Machine learning for novices and experts

Amazon Redshift ML enables you to train models with one single SQL CREATE MODEL command. The CREATE MODEL command creates a model that Amazon Redshift uses to generate model-based predictions with familiar SQL constructs.

Amazon Redshift ML is especially useful when you don't have expertise in machine learning, tools, languages, algorithms, and APIs. With Amazon Redshift ML, you don't have to perform any of the undifferentiated heavy lifting required for integrating with an external machine learning service. Amazon Redshift saves you the time to format and move data, manage permission controls, or build custom integrations, workflows and scripts. You can easily use popular machine learning algorithms and simplify training needs that require frequent iteration from training to prediction. Amazon Redshift automatically discovers the best algorithm and tunes the best model for your problem. You can simply make predictions from within the Amazon Redshift cluster without the need to move data out of Amazon Redshift nor to interface with and pay for another service.

While Amazon Redshift ML empowers data analysts and data scientists to use machine learning, it also allows machine learning experts to use their knowledge to guide the CREATE MODEL to use only the aspects that they specify. By doing so, you can speed up the time that CREATE MODEL needs to find the best candidate and/or improve the accuracy of the model.

The CREATE MODEL statement offers flexibility in how you can specify the parameters to training job. This enables both machine learning novice or expert users to choose their preferred preprocessors, algorithms, problem types, or hyperparameters. For example, a user interested in customer churn might specify at the CREATE MODEL statement that the problem type is a binary classification that works well for customer churn. Then the CREATE MODEL statement narrows down its search for the best model into binary classification models. Even with the user choice of the problem type, there are still many options that the CREATE MODEL statement can work with. For example, the CREATE MODEL discovers and applies the best preprocessing transformations and discovers the best hyperparameter settings.

Amazon Redshift ML makes training easy through automatically finding the best model by using Amazon SageMaker Autopilot. Behind the scene, Amazon SageMaker Autopilot automatically trains and tunes the best machine learning model based on your supplied data. Amazon SageMaker Neo then compiles the training model and makes it available for prediction in your Amazon Redshift cluster. When you run a machine learning inference query using a trained model, the query can use all of Amazon Redshift massively parallel processing capabilities along with the machine learning-based prediction ability.

  • As a machine learning beginner, with general knowledge of different aspects of machine learning such as preprocessors, algorithms and hyperparameters, use the CREATE MODEL statement for only the aspects that you specify. Then you can shorten the time that CREATE MODEL needs to find the best candidate or improve the accuracy of the model. Also, you can increase the business value of the predictions by introducing additional domain knowledge such as the problem type or the objective. For example, in a customer churn scenario, if the outcome “customer is not active” is rare, then the F1 objective is often preferred to the Accuracy objective. Because high Accuracy models might predict “customer is active” all the time, this results in high accuracy but little business value. For information about F1 objectives, see AutoMLJobObjective in the Amazon SageMaker API Reference.

    For more information about the basic options for the CREATE MODEL statement, see Simple CREATE MODEL.

  • As a machine learning advanced practitioner, you can specify the problem type and preprocessors for certain (but not all) features. Then the CREATE MODEL follows your suggestions on the specified aspects while the CREATE MODEL still discovers the best preprocessors for the remaining features and the best hyperparameters. For more information about how you can constrain one or more aspects of the training pipeline, see CREATE MODEL with user guidance.

  • As a machine learning expert, you can take full control of training and hyperparameter tuning. Then the CREATE MODEL statement doesn't attempt to discover the optimal preprocessors, algorithms and hyperparameters because you make all the choices. For more information about how to use the CREATE MODEL statement with AUTO OFF, see CREATE XGBoost models with AUTO OFF.

  • As a data engineer, you can bring a pretrained XGBoost model in Amazon SageMaker and import it into Amazon Redshift for local inference. With bring your own model (BYOM), you can use a model trained outside of Amazon Redshift with Amazon SageMaker for in-database inference locally in Amazon Redshift. Amazon Redshift ML supports using BYOM in either local or remote inference.

    For more information about how to use the CREATE MODEL statement for local or remote inference, see Bring your own model (BYOM).

Amazon Redshift ML makes training easy through automatically finding the best model by using Amazon SageMaker Autopilot. Behind the scene, Amazon SageMaker Autopilot automatically trains and tunes the best machine learning model based on your supplied data. Amazon SageMaker Neo then compiles the training model and makes it available for prediction in your Amazon Redshift cluster. When you run a machine learning inference query using a trained model, the query can use all of Amazon Redshift massively parallel processing capabilities along with the machine learning-based prediction ability.

As an Amazon Redshift ML user, you can choose any of the following options to train and deploy your model.

To help you learn how to use Amazon Redshift ML, you can watch the following video.

For information about the prerequisites for setting up your Amazon Redshift cluster, permissions, and ownership for using Amazon Redshift ML, read the following sections. These sections also describe how simple training and predictions work in Amazon Redshift ML.