Differential Privacy query tips and examples
AWS Clean Rooms Differential Privacy uses a general-purpose query structure to support a wide variety of SQL constructs such as Common Table
Expressions (CTEs) for data preparation and commonly used aggregate functions such as COUNT
, or
SUM
. In order to obfuscate the contribution of any possible user in your data by adding noise to
aggregate query results at run-time, AWS Clean Rooms Differential Privacy requires that aggregate functions in the final
SELECT statement
are run on user-level data.
The following example uses two tables named socialco_impressions
and socialco_users
from a media publisher who wants to protect data using differential privacy while collaborating with an athletic
brand with athletic_brand_sales
data. The media publisher has configured the user_id
column as the user identifier column while enabling differential privacy in AWS Clean Rooms. The advertiser does not need
differential privacy protection and wants to run a query using CTEs on combined data. Since their CTE uses
differential privacy protected tables, the advertiser includes the user identifier column from those protected
tables in the list of CTE columns and joins the protected tables on the user identifier column.
WITH matches_table AS( SELECT si.user_id, si.campaign_id, s.sale_id, s.sale_price FROM socialco_impressions si JOIN socialco_users su ON su.user_id = si.user_id JOIN athletic_brand_sales s ON s.emailsha256 = su.emailsha256 WHERE s.timestamp > si.timestamp UNION ALL SELECT si.user_id, si.campaign_id, s.sale_id, s.sale_price FROM socialco_impressions si JOIN socialco_users su ON su.user_id = si.user_id JOIN athletic_brand_sales s ON s.phonesha256 = su.phonesha256 WHERE s.timestamp > si.timestamp ) SELECT COUNT (DISTINCT user_id) as unique_users FROM matches_table GROUP BY campaign_id ORDER BY COUNT (DISTINCT user_id) DESC LIMIT 5
Similarly, if you want to run window functions on differential privacy protected data tables, you must include
the user identifier column in the PARTITION BY
clause.
ROW_NUMBER() OVER (PARTITION BY conversion_id, user_id ORDER BY match_type, match_age) AS row