AWS Clean Rooms ML - AWS Clean Rooms

AWS Clean Rooms ML

AWS Clean Rooms ML

AWS Clean Rooms ML provides a privacy-preserving method for two parties to identify similar users in their data without the need to share their data with each other. The first party brings the training data to AWS Clean Rooms so that they can create and configure a lookalike model and associate it with a collaboration. The second party then brings their seed data to AWS Clean Rooms and generates a lookalike segment that resembles the training data.

For a more detailed explanation of how this works, see Cross-account jobs.

  • Training data provider – The party that contributes the training data, creates and configures a lookalike model, and then associates that lookalike model with a collaboration.

  • Seed data provider – The party that contributes the seed data, generates a lookalike segment, and exports their lookalike segment.

  • Training data – The training data provider's data, which is used to generate a lookalike model. The training data is used to measure similarity in user behaviors.

    The training data must contain a user ID, item ID, and timestamp column. Optionally, the training data can contain other interactions as numerical or categorical features. Examples of interactions are a list of videos watched, items purchased, or articles read.

  • Seed data – The seed data provider's data, which is used to create a lookalike segment. The lookalike segment output is a set of users from the training data that most closely resembles the seed users.

  • Lookalike model – A machine learning model of the training data that is used to find similar users in other datasets.

    When using the API, the term audience model is used equivalently to lookalike model. For example, you use the CreateAudienceModel API to create a lookalike model.

  • Lookalike segment – A subset of the training data that most closely resembles the seed data.

    When using the API, you create a lookalike segment with the StartAudienceGenerationJob API.

The training data provider's data is never shared with the seed data provider and the seed data provider's data is never shared with the training data provider. The lookalike segment output is shared with the training data provider, but never the seed data provider.

For more information about lookalike models, see the following topics.

How AWS Clean Rooms ML works

An overview of how AWS Clean Rooms ML works..

Clean Rooms ML requires that two parties, a training data provider and a seed data provider, work sequentially in AWS Clean Rooms to bring their data into a collaboration. This is the workflow that the training data provider must complete first:

  1. The training data provider's data must be stored in a AWS Glue data catalog table of user-item interactions. At a minimum, the training data must contain a user ID column, interaction ID column, and a timestamp column.

  2. The training data provider registers the training data with AWS Clean Rooms.

  3. The training data provider creates a lookalike model that can be shared with multiple seed data providers. The lookalike model is a deep neural network that can take up to 24 hours to train. It is not automatically retrained and we recommend that you retrain the model weekly.

  4. The training data provider configures the lookalike model, including whether to share relevance metrics and the Amazon S3 location of the output segments. The training data provider can create multiple configured lookalike models from a single lookalike model.

  5. The training data provider associates the configured audience model to a collaboration that is shared with a seed data provider.

This is the workflow that the seed data provider must complete next:

  1. The seed data provider's data must be stored in an Amazon S3 bucket.

  2. The seed data provider opens the collaboration that they share with the training data provider.

  3. The seed data provider creates a lookalike segment from the Clean Rooms ML tab of the collaboration page.

  4. The seed data provider can evaluate the relevance metrics, if they were shared, and export the lookalike segment for use outside AWS Clean Rooms.