Search

Feedback

Give us feedback!

Tutorial: Using Amazon ML to Predict Responses to a Marketing Offer

With Amazon Machine Learning (Amazon ML), you can build and train predictive applications and host your applications in a scalable cloud solution. In this tutorial, we show you how to use Amazon ML to create a datasource, build a machine learning (ML) model, and use the model to generate batch predictions.

Our sample exercise in the tutorial shows how to identify potential customers for targeted marketing campaigns, but you can apply the same principles to create and use a variety of machine learning models. To complete the sample exercise, you use the publicly available banking and marketing dataset from the University of California at Irvine (UCI) repository. This dataset contains information about customers as well as descriptions of their behavior in response to previous marketing contacts. You use this data to identify which customers are most likely to subscribe to your new product. In the sample dataset, the product is a bank term deposit. A bank term deposit is a deposit made into a bank with a fixed interest rate that cannot be withdrawn for a certain period of time, also known as a certificate of deposit (CD).

To complete the tutorial, you download sample data and upload the data to Amazon S3 to create a datasource—an Amazon ML object that contains information about your data. Next, you create an ML model from the datasource. You evaluate and adjust the ML model’s performance, and then use it to generate predictions.

Note

You need an AWS account for this tutorial. If you don’t have an AWS account, see Setting Up Amazon Machine Learning.

Complete the following steps to get started using Amazon ML:

Step 1: Download, Edit, and Upload Data

Step 2: Create a Datasource

Step 3: Create an ML Model

Step 4: Review the ML Model’s Performance and Set a Score Threshold

Step 5: Use the ML Model to Generate Batch Predictions

Step 6: Clean Up

Step 1: Download, Edit, and Upload Data

To start, you download the data and check to see if you need to format it before you provide it to Amazon ML. For Amazon ML formatting requirements, see Understanding the Data Format for Amazon ML. To make the download step quick for you, we downloaded the banking and marketing dataset from the UCI Machine Learning Repository, formatted it to conform to Amazon ML guidelines, shuffled the records, and made it available at the location that is shown in the following procedure.

To download and save the data

  1. To open the datasets that we have placed in an Amazon S3 bucket for your use, click https://s3.amazonaws.com/aml-sample-data/banking.csv and https://s3.amazonaws.com/aml-sample-data/banking-batch.csv

  2. Download the files by saving them as banking.csv and banking-batch.csv on your desktop.

    If you open the banking.csv file, you should see rows and columns full of data. The header row contains the attribute names for each column. An attribute is a unique, named property. Each row represents a single observation.

    image0
    You want your ML model to answer the following question: Will this customer subscribe to my new product? In the dataset, the answer to this question is in attribute y, which is located in column U. This column contains the values 1 (yes) or 0 (no). The attribute that you want Amazon ML to learn to predict is known as the target attribute.
    The y attribute that you are going to predict is a binary attribute. For binary classification, Amazon ML understands only 1 or 0. To help Amazon ML learn how to predict which of your customers will subscribe to the marketing campaign, we edited the original UCI dataset to make all values of y that are yes equal 1 and all values that are no equal 0. In the dataset that you downloaded, we have already edited the yes and no values to be 1 and 0.

The following two screenshots show the data before and after our edits.

image1

image2

The banking-batch.csv data does not contain the binary attribute, y. Once you have an ML model, we will use the model to predict y for each row in the banking-batch.csv file.

Next, upload your banking.csv and banking-batch.csv files to an Amazon S3 bucket that you own. If you have not created a bucket, see the Amazon S3 User Guide to learn how to create one.

To upload the file to an Amazon S3 bucket

  1. Sign into the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3.
  2. In the buckets list, create or choose the bucket where you want to upload the file, and then choose Upload.
  3. Choose Add Files.
  4. In the dialog box that appears, navigate to your desktop, choose banking.csv and banking-batch.csv, and then choose Open.

Note

The datasource does not actually store your data. The datasource only references it. If you move or change the S3 file, Amazon ML cannot access or use it to create a ML model, generate evaluations, or generate predictions.

Now you are ready to create your datasource.

Step 2: Create a Datasource

After you upload banking.csv to your Amazon S3 bucket, you need to provide Amazon ML with the following information:

  • The Amazon S3 location of your data
  • The names of the attributes in the data and the type of each attribute (numeric, text, categorical, or binary type)
  • The name of the attribute that holds the answer that you want Amazon ML to learn to predict

You provide this information to Amazon ML by creating a datasource. A datasource is an Amazon ML object that holds the location of your input data, the attribute names and types, the name of the target attribute, and descriptive statistics for each attribute. Operations like ML model training or ML model evaluations use a datasource ID to reference your data.

In the next step, you reference banking.csv as the input data of your datasource, provide the schema using the Amazon ML console to assign data types, and select a target attribute.

Input Data Amazon ML uses input data to train ML models. Input data must be in a CSV. To create your targeted marketing campaign, use the Banking dataset as input data. Input data for training contains the correct answer for the attribute y that you want Amazon ML to predict. You must provide Amazon ML with a dataset for which you know the correct answer so that Amazon ML can learn the patterns among the input attributes. Learning these patterns helps Amazon ML predict which customers are more likely to subscribe to the new product.
To reference input data for the training datasource
  1. Open the Amazon Machine Learning console at https://console.aws.amazon.com/machinelearning/. On the Amazon ML console, you can create data sources, ML models, evaluations, and batch predictions. You can also view detail pages for these objects, which include information such as the object’s creation status.

  2. On the Entities page, choose Create new, Datasource.

    image3

  3. On the Input Data page, for Where is your data located?, select S3.

    image4

    For S3 Location, type the location of the banking.csv file dataset: example-bucket/banking.csv

  4. For Datasource name, type Banking Data 1.
    image5
  5. Choose Verify.

  6. In the S3 permissions dialog box, choose Yes.

    image6

    Amazon ML validates the location of your data.

  7. If your information is correct, a property page appears with a Validation success message. Review the properties, and then choose Continue.

    image7

Schema

Next, you establish a schema. A schema is composed of attributes and their assigned data types. There are two ways to provide Amazon ML with a schema:

  • Provide a separate schema file when you upload your Amazon S3 data
  • Allow Amazon ML to infer the attribute types and create a schema for you

In this tutorial, Amazon ML infers the schema for you.

For more information about creating a separate schema file, see this link.

To create a schema by using Amazon ML

  1. On the Schema page, for Does the first line in your CSV contain the column names?, choose Yes.

    image8

The data type of each attribute is inferred by Amazon ML based on a sample of each attribute’s values. It is important that attributes are assigned the most correct data type possible to help Amazon ML ingest the data correctly and to enable the correct feature processing on the attributes. This step influences the predictive performance of the ML model that is trained on this datasource.

  1. Review the data types identified by Amazon ML by checking the sample values for the attributes on all three pages:
  • Attributes that are numeric quantities for which the order is meaningful should be marked as numeric

  • Attributes that are numbers or strings that are used to denote a category should be marked as categorical

  • Attributes that are expected to take only values 1 or 0 should be marked as binary

  • Attributes that are strings that you would like to treat as words delimited by spaces should be marked as text

    image9

  1. In preceding example, Amazon ML has correctly identified the data types for all the attributes, so choose Continue.

Next, you select a target attribute.

Target Attribute

In this step, you select a target attribute. The target attribute is the attribute that the ML model must learn to predict. Because you are trying to send the new marketing campaign to customers who are most likely to subscribe, you should choose the binary attribute y as your target attribute. This binary attribute labels an individual as having subscribed for a campaign in the past: 1 (yes) or 0 (no). When you select y as your target attribute, Amazon ML identifies patterns in the datasource that was used for training to create a mathematical model. The model can generate predictions about data for which you do not know the answer.

For example, if you want to predict your customers’ education levels, you would choose education as your target attribute.

Note

Target attributes are required only if you use the datasource for training ML models and evaluating ML models.

To select y as the target attribute

  1. On the Target page, for Do you want to use this dataset to create and/or evaluate a ML model?, choose Yes.

  2. In the lower right of the table, choose the single arrow until the attribute y appears in the table.

    image10

  3. In the Target column, choose the option next to y.

    image11

    Amazon ML confirms that y is selected as your target.

    image12

  4. Choose Continue.

  5. On the Row ID page, for Do you want to select an identifier?, choose No.

  6. Choose Review.

    image13

  7. On the Review page, choose Finish.

Once you choose Finish, the request to create the datasource is submitted. The datasource moves into Initialized status and takes a few minutes to reach Completed status. You do not need to wait for the datasource to complete, so proceed to the next step.

Step 3: Create an ML Model

After the request to create the datasource has been submitted, you use it to train an ML model. The ML model generates predictions by using your training datasource to identify patterns in the historical data.

To create an ML model

  1. Choose Amazon Machine Learning, ML models.
    On the ML models summary page, choose Create new ML model.

    image14

  2. Because you’ve already created a datasource, choose I already created a datasource pointing to my S3 data.

    image15

  3. In the table, choose Banking Data 1, and then choose Continue.

    image16

  4. On the ML model settings page, for ML model name, type Subscription propensity model.

    image17

Giving your ML model a human readable name helps you identify and manage the ML model.

  1. For Training and evaluation settings, choose Default.

image18

  1. For Name this evaluation, type Subscription propensity evaluation.
  2. Choose Review.
  3. Review your data, and then choose Finish.

Once you choose Finish, the following requests are submitted:

  • Split the input datasource into 70% for training and 30% for evaluation
  • Create the ML model to train on 70% of the input data
  • Create an evaluation to evaluate the ML model on 30% of the input data

The split datasources, ML model, and evaluation move into Pending status and take a few minutes to reach Completed status. You need to wait for the evaluation to complete before proceeding to step 4.

Please see Training models and evaluating models for more information.

Step 4: Review the ML Model Predictive Performance and Set a Cut-Off

Now that the ML model is successfully created and evaluated, let’s see if it is good enough to put to use. Amazon ML already computed an industry-standard quality metric called the Area Under a Curve (AUC) metric that expresses the performance quality of your ML model. Start by reviewing and interpreting it.

Reviewing the AUC Metric

An evaluation describes whether or not your ML model is better than making random guesses. Amazon ML interprets the AUC metric to tell you if the quality of the ML model is adequate for most machine learning

applications. Learn more about AUC in the Amazon Machine Learning Concepts.

Next, let’s look at the AUC metric of your ML model.

To view the AUC metric of your ML model

  1. Choose Amazon Machine Learning, ML models.

    image19

  2. In the ML models table, select Subscription propensity model.

    image20

  3. On the ML model report page, choose Evaluations, Subscription propensity evaluation.

    image21

  4. Choose Summary.

    image22

  5. On the Evaluation summary page, review your information. This page includes a summary of your evaluation, including the AUC performance metric of the ML model.

    image23

Next, you set a score threshold in order to change the ML model’s behavior when it makes a mistake.

Setting a Score Threshold

Our ML model works by generating numeric prediction scores, and then applying a threshold to convert these scores into binary 0/1 labels. By changing the score threshold, you can adjust the ML model’s behavior for which records are predicted as 0/1.

To set a score threshold for your ML model

  1. On the Evaluation summary page, choose Adjust Score Threshold.

Amazon ML displays the ML model performance results page. This page includes a chart that shows the score distribution of your predictions. You use this page to view advanced metrics and the effect of different score thresholds on the performance of your model. You can fine-tune your ML model performance metrics by adjusting the score threshold value.

image24
  1. Let’s say you want to target the top 3% of the customers that are most likely to subscribe to the product. Slide the vertical selector to set the score threshold to a value that corresponds to 3% of the records predicted as “1”.

    image25

You can review the impact of this score threshold on the ML model’s performance. Now let’s say the false positive rate of 0.007 is acceptable to your application.

  1. Choose Save Score Threshold.

    image26

The score threshold is saved for this ML model.

image27

Each time you use this ML model to make predictions, it will predict records with scores>0.77 to be predicted as “1”, and the rest of the records will be predicted as “0”.

Remember, machine learning is an iterative process that requires you to discover what score threshold is most appropriate for you. You can adjust the predictions by adjusting your score threshold based on your use case.

To learn more about the score threshold, see the Amazon Machine Learning Concepts.

Step 5: Use the ML Model to Create Batch Predictions

In Amazon ML, there are two ways to get predictions—batch and online. If your application requires predictions to be generated in real-time, you first need to mount the ML model to get online predictions. When you mount an ML model, you make it available to generate predictions on demand, and at low latency. These real-time predictions are usually used in interactive web, mobile, or desktop applications.

For this tutorial, you choose the method that generates predictions for a large batch of input records without going through the real-time Enable for Real-time Prediction interface.

A batch prediction is useful when you want to generate predictions for a set of observations all at once, and you do not have a low latency requirement. For your targeted marketing campaign, you want a single file with all of the answers included in it. In this sample problem, you are scoring your customers for whom you have not yet marketed your new product as a batch, and you don’t need to predict who will subscribe to the new product in real time.

Batch Predictions

When creating batch predictions, you select your banking data ML model as well as the prediction data from which you want to generate predictions. When the request is complete, your batch predictions are sent to an Amazon S3 bucket that you define. When Amazon ML makes the predictions, you will be able to more effectively strategize and execute your targeted marketing campaign.

To create batch predictions

  1. Choose Amazon Machine Learning, Batch predictions.

    image28

  2. Choose Create new batch prediction.

    image29

  3. On the ML Model for batch predictions page, choose Subscription propensity model from the list.

The ML model name, ID, creation time, and the associated datasource ID appears.

  1. Choose Continue.

To generate predictions, you need to show Amazon ML the data that you need answers to. This is called the input data.

  1. For Locate the input data, choose My data is in S3, and I need to create a datasource.

    image30

  2. For Datasource name, type Banking Data 2.

  3. For S3 Location, enter the location of your banking-batch.csv.

  4. For Does the first line in your CSV contain the column names?, choose Yes.

  5. Choose Verify.

  6. In the S3 permissions dialog box, choose Yes.

Amazon ML validates the location of your data.

  1. Choose Continue.
  2. For S3 destination, type an easily accessible Amazon S3 bucket for your prediction files.
  3. For Batch prediction name, type Subscription propensity predictions.
  4. In the S3 permissions dialog box, choose Yes.
image31
  1. Choose Review.
  2. On the Review page, choose Finish.
The batch prediction request is sent to Amazon ML and entered into a queue. At first, the status of your batch prediction is set as Pending. The time it takes for a batch prediction to complete depends on the size of your datasource and the complexity of your ML model.
After the batch prediction has successfully completed, its status changes to Completed.

To view the predictions

  1. Choose Amazon Machine Learning, Batch predictions.

    image32

  2. In list of batch predictions, choose Subscription propensity predictions. The Batch prediction info page appears.

    image33

  3. Navigate to the Output S3 URL in your Amazon S3 console to view the batch prediction.

image34

The prediction is stored in a compressed .gz file.

  1. Download the file to your desktop, and uncompress and open the prediction file.

    image35

The file includes two columns: bestAnswer and score. The bestAnswer column is based on the score threshold that you set in step 4.

Prediction Examples

The following examples show a positive and negative prediction based on the score threshold.

Positive prediction: image36

In the positive prediction example, the value for bestAnswer is 1, and the value of score is 0.88682. The value for bestAnswer is 1 because the score value is above the score threshold of 0.77 that you saved.

Negative prediction: image37

The value of bestAnswer in the negative prediction example is 0 because the score value is 0.76525, which is less than the score threshold of 0.77.

Step 6: Clean Up

You have now successfully completed the tutorial. To prevent your account from accruing additional S3 charges, you should clean up the data stored in S3 for this tutorial.

To delete the input data used for training, evaluation, and batch prediction steps

  1. Open the Amazon S3 console.
  2. Navigate to the S3 bucket where you stored the banking.csv and banking-batch.csv.
  3. Select the two files and the .writePermissionCheck.tmp file.
  4. Choose Actions, Delete.
  5. When prompted for confirmation, choose OK.

To delete the predictions generated from the batch prediction step

  1. Open the Amazon S3 console.
  2. Navigate to the bucket where you stored the output of the batch predictions.
  3. Select the batch-prediction folder.
  4. Choose Actions, Delete.
  5. When prompted for confirmation, click OK.

To learn how to use the API, see the Amazon Machine Learning API Reference.