Step 2: Preparing and importing data - Amazon Personalize

Step 2: Preparing and importing data

Amazon Personalize uses your data to generate recommendations for your users and user segments. Amazon Personalize stores your data in datasets until you delete the datasets. For all use cases (Domain dataset groups) and recipes (custom resources), your interactions data must have the following:

  • At minimum 1000 item interactions records from users interacting with items in your catalog. These interactions can be from bulk imports, or streamed events, or both.

  • At minimum 25 unique user IDs with at least two item interactions for each.

For quality recommendations, we recommend that you have at minimum 50,000 item interactions from at least 1,000 users with two or more item interactions each.

When you import data, you can choose to import records in bulk, individually, or both.

  • Bulk imports involve importing a large number of historical records at once. You can prepare and import your item interaction, user, and item bulk data with SageMaker Data Wrangler and multiple data sources. Or you can prepare bulk data yourself, and import it directly into Amazon Personalize from a CSV file in Amazon S3. For information about how to format your bulk data for Amazon Personalize, see Data format guidelines.

  • With individual imports, you import individual records with the Amazon Personalize console and API operations. Or you can import interactions data from live events in real time.

After you import data into an Amazon Personalize dataset, you can analyze it, export it to an Amazon S3 bucket, update it, or delete it by deleting the dataset. For more information, see Managing the training data in your datasets.

As your catalog grows, update your historical data with additional bulk, or individual data, import operations. For real-time recommendations, keep your Item interactions dataset up to date with your users' behavior. You do this by recording real-time interaction events with an event tracker and the PutEvents operation. For more information, see Recording real-time events to influence recommendations