Preparing and importing data
Amazon Personalize uses data that you provide to train a model. When you import data, you can choose to import records in bulk, individually, or both. With individual imports, you can import historical records or data from live events. As your catalog grows, we recommend that you complete additional imports to keep your data in Amazon Personalize up to date. For real-time recommendations, keep your Interactions dataset up to date with your users' behavior by recording interaction live events with an event tracker and the PutEvents operation.
The minimum data requirements to train a model are as follows:
-
1000 records of combined interaction data (after filtering by
eventType
andeventValueThreshold
, if provided). -
25 unique users with at least 2 interactions each.
To import your training data into Amazon Personalize, you do the following:
-
Create an empty dataset group. Dataset groups are containers for Amazon Personalize components. For more information, see Step 1: Creating a Custom dataset group.
-
For each type of dataset you are using, create an empty dataset with an associated schema. Datasets are Amazon Personalize containers for data and schemas tell Amazon Personalize about the structure of your data. For more information, see Step 2: Creating a dataset and a schema.
-
Import your data:
-
Import bulk records stored in an Amazon S3 bucket using a dataset import job. See Importing bulk records.
-
Import historical records individually with the Amazon Personalize console or the PutUsers, PutItems APIs. See Importing individual records.
-
Import data from user interactions in real time with the PutEvents API operation.
-
This section provides information about importing historical data into Amazon Personalize. For information about recording data from live events in real time, see Recording events.