Appendix C: Update Athena Query to Match Amazon Personalize Interaction Schema - Amazon Personalize Optimizer Using Amazon Pinpoint Events

Appendix C: Update Athena Query to Match Amazon Personalize Interaction Schema

This solution assumes a particular schema for your Amazon Personalize interaction dataset. If your schema is different, the solution’s AWS Lambda function that imports the data will fail to run. To correct the issue, update or rewrite the Amazon Athena query to retrieve the correct values that matches your schema.

The following code snippet shows the SELECT statement in the Amazon Athena query that is deployed with this solution. Running this query generates a CSV file with the following fields: USER_ID, ITEM_ID, EVENT_TYPE, EVENT_VALUE, and TIMESTAMP. These fields must match the interaction dataset schema.

SELECT r.personalize_user_id as USER_ID, r.item_id AS ITEM_ID, b.event_type AS EVENT_TYPE, v.EVENT_VALUE, CAST(to_unixtime(b.arrival_timestamp) AS BIGINT) AS TIMESTAMP FROM endpoint_export a INNER JOIN recs r ON a.id = r.endpoint_id INNER JOIN evs b ON a.id = b.endpoint_id AND r.campaign_id = b.campaign_id INNER JOIN event_value v ON b.event_type = v.event_type

To update the Amazon Athena query, first create a new query. Next, in the AWS Lambda function QueryAugmentStartLambda, update the environmental variable NAMED_QUERY with the query ID of the Lambda function.

For example, if your interaction schema contains only the required fields: USER_ID, ITEM_ID, and TIMESTAMP, then the query’s final SELECT statement may be updated as shown in the following code snippet.

SELECT r.personalize_user_id as USER_ID, r.item_id AS ITEM_ID, CAST(to_unixtime(b.arrival_timestamp) AS BIGINT) AS TIMESTAMP FROM endpoint_export a INNER JOIN recs r ON a.id = r.endpoint_id INNER JOIN evs b ON a.id = b.endpoint_id AND r.campaign_id = b.campaign_id INNER JOIN event_value v ON b.event_type = v.event_type