Item interactions dataset schema requirements (custom) - Amazon Personalize

Item interactions dataset schema requirements (custom)

An Item interactions dataset stores historical and real-time data from interactions between users and items in your catalog. For information on the types of interactions data Amazon Personalize can use, see Item interactions dataset.

The data you provide for each interaction must match your schema. Depending on your schema, interaction metadata can include empty/null values. At minimum, you must provide the following for each interaction:

  • User ID

  • Item ID

  • Timestamp (in Unix epoch time format)

You are free to add additional fields depending on your use case and your data. As long as the fields aren't listed as required or reserved, and the data types are listed in Schema data types, the field names and data types are up to you.

The maximum total number of optional metadata fields you can add to an Item interactions dataset, combined with total number of distinct event types in your Item interaction data, is 10. The metadata fields included in this count are EVENT_TYPE, EVENT_VALUE fields along with any custom metadata fields you add to your schema. The maximum number of metadata fields excluding reserved fields, such as IMPRESSION, is 5. Categorical values can have at most 1000 characters. If you have an interaction with a categorical value with more than 1000, your dataset import job will fail.

For more information on minimum requirements and maximum data limits for an Item interactions dataset, see Service quotas.

Interactions schema example (custom)

The following example shows a schema for an Item interactions dataset. The USER_ID, ITEM_ID, and TIMESTAMP fields are required. The EVENT_TYPE, EVENT_VALUE, and IMPRESSION fields are optional reserved keywords recognized by Amazon Personalize. EVENT_TYPE must of type string and can't be categorical. LOCATION and DEVICE are optional contextual metadata fields. For information on schema requirements see Custom dataset and schema requirements.

{ "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "ITEM_ID", "type": "string" }, { "name": "EVENT_TYPE", "type": "string" }, { "name": "EVENT_VALUE", "type": [ "float", "null" ] }, { "name": "LOCATION", "type": "string", "categorical": true }, { "name": "DEVICE", "type": [ "string", "null" ], "categorical": true }, { "name": "TIMESTAMP", "type": "long" }, { "name": "IMPRESSION", "type": "string" } ], "version": "1.0" }

For this schema, the first few lines of historical data in a CSV file might look like the following. Note that some values for EVENT_VALUE are null.

USER_ID,ITEM_ID,EVENT_TYPE,EVENT_VALUE,LOCATION,DEVICE,TIMESTAMP,IMPRESSION 35,73,click,,Ohio,Tablet,1586731606,73|70|17|95|96|92|55|45|16|97|56|54|33|94|36|10|5|43|19|13|51|90|65|59|38 54,35,watch,0.75,Indiana,Cellphone,1586735164,35|82|78|57|20|63|1|90|76|75|49|71|26|24|25|6|37|85|40|98|32|13|11|54|48 9,33,click,,Oregon,Cellphone,1586735158,68|33|62|6|15|57|45|24|78|89|90|40|26|91|66|31|47|17|99|29|27|41|77|75|14 23,10,watch,0.25,California,Tablet,1586735697,92|89|36|10|39|77|4|27|79|18|83|16|28|68|78|40|50|3|99|7|87|49|12|57|53 27,11,watch,0.55,Indiana,Tablet,1586735763,11|7|39|95|71|1|6|40|41|28|99|53|68|76|0|65|69|36|22|42|34|67|24|20|66 ... ...