Amazon Personalize
Developer Guide

Formating Your Input Data

The files you use to import data into Amazon Personalize must map to the schema you are using.

Amazon Personalize imports data from files that are in CSV format. In CSV files, individual values are separated by commas. Amazon Personalize requires the first row of your CSV file to contain column headers. The column headers in your CSV file need to map to the schema used by the dataset. The headers should not be surrounded by quotation marks (").

For example, the following CSV data sample maps to the Interactions schema created previously in Datasets and Schemas. This data might represent historical user activity from a website that sells movie tickets. The data can be used to train a model that gives a user movie recommendations based on the activity of other users.

USER_ID,ITEM_ID,EVENT_TYPE,EVENT_VALUE,TIMESTAMP 196,242,click,15,881250949 186,302,click,13,891717742 22,377,click,10,878887116 244,51,click,20,880606923 166,346,click,10,886397596 298,474,click,40,884182806 115,265,click,20,881171488 253,465,click,50,891628467 305,451,click,30,886324817

The associated Interactions schema is repeated below.

{ "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "ITEM_ID", "type": "string" }, { "name": "EVENT_TYPE", "type": "string" }, { "name": "EVENT_VALUE", "type": "string" }, { "name": "TIMESTAMP", "type": "long" } ], "version": "1.0" }

The USER_ID, ITEM_ID, and TIMESTAMP fields are required by Amazon Personalize. USER_ID is the identifier for a user of your application. ITEM_ID is the identifier for a movie. EVENT_TYPE and EVENT_VALUE are the identifiers for user activities. In the sample data, click might represent a movie purchase event and 15 the purchase price of the movie. TIMESTAMP represents the Unix time that the movie purchase took place.

Categorical Data

To include multiple categories for a single item when using categorical string data, separate the values using the vertical bar, '|', character. For example, to match the Items schema from the previous section using two categories, a data row would resemble the following:

ITEM_ID,GENRE item_123,horror|comedy

After you have formatted your data, upload the data to an Amazon S3 bucket for import into Amazon Personalize. For more information, see Uploading to an Amazon S3 Bucket.

On this page: