Amazon Personalize
Developer Guide

The AWS Documentation website is getting a new look!
Try it now and let us know what you think. Switch to the new look >>

You can return to the original look by selecting English in the language selector above.

SIMS Recipe

Item-to-item similarities (SIMS) is based on the concept of collaborative filtering. A SIMS model leverages user-item interaction data to recommend items similar to a given item. In the absence of sufficient user behavior data for an item, this recipe recommends popular items.

This predefined recipe has the following properties:

  • Nameaws-sims

  • Recipe ARNarn:aws:personalize:::recipe/aws-sims

  • Algorithm ARNarn:aws:personalize:::algorithm/aws-sims

  • Feature Transformation ARNarn:aws:personalize:::feature-transformation/sims

  • Recipe typeRELATED_ITEMS

The following table lists the hyperparameters used in the recipe. For each hyperparameter the name, default value, and description are given, as well as the following properties:

  • Range: [lower bound, upper bound]

  • Value type: Integer, Continuous (float), Categorical (boolean, list, string)

  • HPO tunable: Can the parameter participate in hyperparameter optimization (HPO)?

Name Default Value Range Value Type HPO Tunable Description
Algorithm
popularity_discount_factor 0.5 [0.0, 1.0] Float Yes

Affects the balance between popularity and correlation when you calculate similarity. If you calculate similarities to a specific item, a value of 0 makes most popular items appear as recommendations regardless of their correlation. A value of 1 makes most items that have co-interactions (shared interaction) with the specific item appear as recommendations regardless of their popularity. Using either extreme might create an overly long list of recommend items. For most cases, a value around 0.5 works best.

min_cointeraction_count 3 [0, 10] Integer Yes

The minimum number of co-interactions you need to calculate the similarity between a pair of items. For example, a value of 3 means that you need three or more users who interacted with both items for the algorithm to calculate their similarity.

Featurization
min_user_history_length_percentile 0.005 [0.0, 1.0] Float No The minimum percentile of user history lengths to include in model training. History length is the total amount of available data on a user. Use min_user_history_length_percentile to exclude a percentage of users with short history lengths. Users with a short history often show patterns based on item popularity instead of the user's personal needs or wants. Removing them can train models with more focus on underlying patterns in your data. Choose an appropriate value after you review user history lengths, using a histogram or similar tool. We recommend setting a value that retains the majority of users, but removes the edge cases.
max_user_history_length_percentile 0.995 [0.0, 1.0] Float No

The maximum percentile of user history lengths to include in model training. History length is the total amount of available data on a user. Use max_user_history_length_percentile to exclude a percentage of users with long history lengths. Users with a long history tend to contain noise. For example, a robot might have a long list of automated interactions. Removing these users limits the noise introduced in training. Choose an appropriate value after you review user history lengths using a histogram or similar tool. We recommend setting a value that retains the majority of users but removes the edge cases.

For example, min_hist_length_percentile = 0.05 and max_hist_length_percentile = 0.95 include all users except ones with history lengths at the bottom or top 5%.

min_item_interaction_count_percentile 0.01 [0.0, 1.0] Float No The minimum percentile of item interaction counts to include in model training. Use min_item_interaction_count_percentile to exclude a percentage of items with short history of interactions. Items with a short history often are new items. Removing them can train models with more focus on items with a known history. Choose an appropriate value after you review user history lengths, using a histogram or similar tool. We recommend setting a value that retains the majority of items, but removes the edge cases.
max_item_interaction_count_percentile 0.9 [0.0, 1.0] Float No

The maximum percentile of item interaction counts to include in model training.Use max_item_interaction_count_percentile to exclude a percentage of items with long history of interactions. Items with a long history tend to be older and might be out of date. For example, a movie release that is out of print. Removing these items can focus on more relevant items. Choose an appropriate value after you review user history lengths using a histogram or similar tool. We recommend setting a value that retains the majority of items but removes the edge cases.

For example, min_item_interaction_count_percentile = 0.05 and max_item_interaction_count_percentile = 0.95 include all items except ones with an interaction count at the bottom or top 5%.