We are no longer updating the Amazon Machine Learning service or accepting new users for it. This documentation is available for existing users, but we are no longer updating it. For more information, see What is Amazon Machine Learning.
Step 1: Prepare Your Data
In machine learning, you typically obtain the data and ensure that it is well
formatted before starting the training process. For the purposes of this tutorial, we
obtained a sample dataset from the UCI Machine Learning Repository
For Amazon ML formatting requirements, see Understanding the Data Format for Amazon ML.
To download the datasets
-
Download the file that contains the historical data for customers who have purchased products similar to your bank term deposit by clicking banking.zip. Unzip the folder and save the banking.csv file to your computer.
-
Download the file that you will use to predict whether potential customers will respond to your offer by clicking banking-batch.zip. Unzip the folder and save the banking-batch.csv file to your computer.
-
Open
banking.csv
. You will see rows and columns of data. The header row contains the attribute names for each column. An attribute is a unique, named property that describes a particular characteristic of each customer; for example, nr_employed indicates the customer's employment status. Each row represents the collection of observations about a single customer.You want your ML model to answer the question "Will this customer subscribe to my new product?". In the
banking.csv
dataset, the answer to this question is attribute y, which contains the values 1 (for yes) or 0 (for no). The attribute that you want Amazon ML to learn how to predict is known as the target attribute.Note
Attribute y is a binary attribute. It can contain only one of two values, in this case 0 or 1. In the original UCI dataset, the y attribute is either Yes or No. We have edited the original dataset for you. All values of attribute y that mean yes are now 1, and all values that mean no are now 0. If you use your own data, you can use other values for a binary attribute. For more information about valid values, see Using the AttributeType Field.
The following examples show the data before and after we changed the values in attribute y to the binary attributes 0 and 1.


The banking-batch.csv
file doesn’t contain the y attribute. After you have
created an ML model, you will use the model to predict y for each record in
that file.
Next, upload the banking.csv
and banking-batch.csv
files to Amazon S3.
To upload the files to an Amazon S3 location
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/
. -
In the All Buckets list, create a bucket or choose the location where you want to upload the files.
-
In the navigation bar, choose Upload.
-
Choose Add Files.
-
In the dialog box, navigate to your desktop, choose
banking.csv
andbanking-batch.csv
, and then choose Open.
Now you are ready to create your training datasource.