Schemas - Amazon Personalize


A schema tells Amazon Personalize about the structure of your data and allows Amazon Personalize to parse the data. A schema has a name key whose value must match the dataset type. After you create a schema, you can't make changes to the schema.

For Domain dataset groups, each dataset type has a default schema with required fields and reserved keywords. Each time you create a dataset, you can either use the existing domain schema or create a new one by modifying the existing default schema. Use the default schema as a guide for what data to import for your domain. Once you define the schema and create the dataset, you can't make changes to the schema.

Schema formatting requirements

When you create a schema for either dataset in a Domain dataset group or Custom dataset group, you must follow these guidelines:

  • You must define the schema in Avro format. For information on the Avro data types we support, see Schema data types.

  • The schema fields can appear in any order, but they must match the order of the corresponding column headers in your CSV file.

  • Schemas must be flat JSON files without nested structures. For example, a field cannot be the parent of multiple sub-fields.

  • Amazon Personalize schemas don't support complex types such as arrays and maps.

  • Schema fields must have unique alphanumeric names. For example, you can't add both a GENRES_FIELD_1 field and a GENRESFIELD1 field.

  • You must define required fields as their required data types. Reserved categorical string fields must have the categorical attribute set to true, while reserved string fields can't be categorical. The keywords can't be in your data.

  • If you add your own metadata field of type string and you want Amazon Personalize to use it when training, it must include the categorical attribute or the textual attribute (only Items schemas support fields with the textual attribute).

  • Amazon Personalize can use non-categorical string columns, such as item name columns, when generating themes, returning metadata in recommendations, and filtering recommendations. For more information, see Non-categorical string data.

  • Amazon Personalize doesn't use boolean type data when training or filtering recommendations. To have Amazon Personalize use boolean data when training or filtering, use a field of type String and use the values "True" and "False" in your data. Or you can use type int or long and values 0 and 1.

  • Textual fields must be of the type string and must have the textual attribute set to true. For more information about unstructured text data, see Unstructured text metadata.

Domain dataset group datasets have additional requirements based on both domain and dataset type. Custom dataset group datasets have additional requirements depending on type.

Schema data types

Amazon Personalize schemas support the following Avro types for fields:

  • float

  • double

  • int

  • long

  • string

  • boolean

  • null

Some required and reserved fields support null data. Adding a null type to a field allows you to use imperfect data (for example, metadata with blank values) to generate recommendations. For information on which fields support null data, see Domain datasets and schemas or Custom datasets and schemas. The following example shows how to add a null type for a GENDER field.

{ "name": "GENDER", "type": [ "null", "string" ], "categorical": true }