Items dataset - Amazon Personalize

Items dataset

The item data that you can import into Amazon Personalize includes numerical and categorical metadata such as creation timestamp, price, genre, description, and availability. You import metadata about your items into an Amazon Personalize Items dataset.

Amazon Personalize doesn't use non-categorical string item data, such as item titles or author data when training. However, some Amazon Personalize features do use this data to enhance recommendations. For more information, see Non-categorical string data

The maximum number of metadata columns is 100. The maximum number of items that are considered by a model during training is 750,000. Amazon Personalize only considers these items when generating recommendations. Some domains and recipes require an Items dataset. For more information on recipe requirements see Choosing a recipe.

This topic provides information about the following types of item data:

Creation timestamp data

Amazon Personalize uses creation timestamp data (in Unix epoch time format, in seconds) to calculate the age of an item and adjust recommendations accordingly.

If creation timestamp data is missing for one or more items, Amazon Personalize infers this information from interaction data, if any, and uses the timestamp of the item’s oldest interaction data as the item's creation timestamp. If an item has no interaction data, its creation timestamp is set as the timestamp of the latest interaction in the training set and Amazon Personalize considers it a new item.

Categorical metadata

With certain recipes and domains, Amazon Personalize uses categorical metadata, such as an item's genre or color, when identifying underlying patterns that reveal the most relevant items for your users. You define your own range of values based on your use case. Categorical metadata can be in any language.

With all recipes and domains, you can import categorical data and use it to filter recommendations based on an item's attributes. For information about filtering recommendations, see Filtering recommendations and user segments.

Categorical values can have a maximum of 1000 characters. If you have an item with a categorical value with more than 1000 characters, your dataset import job will fail.

For Domain dataset groups, both VIDEO_ON_DEMAND and ECOMMERCE domains use categorical metadata. For Custom dataset groups and custom solutions, recipes that use categorical metadata include the following:

Unstructured text metadata

With certain recipes and domains, Amazon Personalize can extract meaningful information from unstructured text metadata, such as product descriptions, product reviews, or movie synopses. Amazon Personalize uses unstructured text to identify relevant items for your users, particularly when items are new or have less interactions data. Include unstructured text data in your Items dataset to increase click-through rates and conversation rates for new items in your catalog.

To use unstructured data, add a field with type string to your Items schema and set the field's textual attribute to true. Then include the text data in your bulk CSV file and individual item imports. For bulk CSV files, wrap the text in double quotes. Use the \ character to escape any double quotes or \ characters in your data. You can add at most 1 textual field. For an example of an Items schema with a field for unstructured text data, see Items dataset schema example (custom). Amazon Personalize truncates text fields at the character limit. Make sure that the most relevant information in the text is at the start of the field. For information about importing data into Amazon Personalize, see Step 2: Preparing and importing data.

Before using unstructured text values, Amazon Personalize removes the following from the text:

  • HTML and XML tags and entities

  • New line, tab, and extra space characters

Unstructured text values can have at most 20,000 characters in all languages except Chinese and Japanese. For Chinese and Japanese, you can have at most 7,000 characters. Amazon Personalize truncates values that exceed the character limit to the character limit.

Text can be in the following languages:

  • Chinese (Simplified)

  • Chinese (Traditional)

  • English

  • French

  • German

  • Japanese

  • Portuguese

  • Spanish

You can submit unstructured text items in multiple languages, but each item's text should be in only one language.

For Domain dataset groups, both VIDEO_ON_DEMAND and ECOMMERCE domains use textual metadata. For Custom dataset groups and custom solutions, recipes that use textual metadata include the following:

Non-categorical string data

Except for item IDs, Amazon Personalize doesn't use non-categorical string data when training, such as item titles or author data. However, Amazon Personalize can use it with the following features:

  • Amazon Personalize can include item metadata in recommendations, including non-categorical string values. You might use metadata to enrich recommendations in your user interface, such as adding the director's name to a movie recommendations carousel. For more information, see Metadata with recommendations.

  • If you use Similar-Items, you can generate batch recommendations with themes. When you generate batch recommendations with themes, you must specify an item name column in the batch inference job. For more information, see Batch recommendations with themes from Content Generator.

  • You can create filters to include or remove items from recommendations based on non-categorical string data. For more information about filters, see Filtering recommendations and user segments.