Appendix: Using Your Own Dataset - Predictive Segmentation Using Amazon Pinpoint and Amazon SageMaker

Appendix: Using Your Own Dataset

Predictive Segmentation Using Amazon Pinpoint and Amazon SageMaker includes a simple, example dataset that contains example customer data, engagement data, and endpoint export data.

For customers who want to use this solution with their own data, we recommend following AWS best practices for uploading data in Amazon Simple Storage Service (Amazon S3).

AWS Glue will crawl your data daily. With AWS Glue, you pay an hourly rate, billed by the second, for the crawler runtime to discover data and populate the AWS Glue Data Catalog. If you have a large dataset, you can reduce costs and achieve better performance if you partition, compress, or convert your data into a columnar format such as Apache Parquet.

Use the following steps to modify the solution to use your dataset.

  1. Load your customer data into the solution’s Amazon S3 bucket (datas3bucket). The bucket is located in the customers prefix folder.

  2. With the help of an experienced data scientist, train the ML model using your dataset and features from Amazon Pinpoint behavioral data.

    For a list of Amazon Pinpoint events, see Events in the Amazon Pinpoint REST API Reference.

  3. Create a new Amazon Athena query that pulls the applicable ML model features from your dataset. For more information, see Getting Started in the Amazon Athena User Guide.

  4. Update the QueryAugmentStart AWS Lambda function's NAMED_QUERY environment variable with the identifier for the query you created in the previous step.