Aggregation analysis rule
In AWS Clean Rooms, an aggregation analysis rule generates aggregate statistics using COUNT, SUM, and/or AVG functions along optional dimensions. When the aggregation analysis rule is added to a configured table, it enables the member who can query to run queries on the configured table.
The aggregation analysis rule supports uses cases such as campaign planning, media reach, frequency measurement, and attribution.
The supported query structure and syntax are defined in Aggregation query structure and syntax.
The parameters of the analysis rule, defined in Aggregation analysis rule - query controls, include query controls and query results controls. Its query controls include the ability to require that a configured table is joined to at least one configured table owned by the member who can query, either directly or transitively. This requirement allows you to ensure that the query is run on the intersection (INNER JOIN) of your table and theirs.
Aggregation query structure and syntax
Queries on tables that have an aggregation analysis rule must adhere to the following syntax.
--
select_aggregate_function_expression
SELECT aggregation_function(column_name) [[AS] column_alias ] [, ...] --select_grouping_column_expression
[, {column_name|scalar_function(arguments)} [[AS] column_alias ]][, ...] --table_expression
FROM table_name [[AS] table_alias ] [[INNER] JOIN table_name [[AS] table_alias] ON join_condition] [...] --where_expression
[WHERE where_condition] --group_by_expression
[GROUP BY {column_name|scalar_function(arguments)}, ...]] --having_expression
[HAVING having_condition] --order_by_expression
[ORDER BY {column_name|scalar_function(arguments)} [{ASC|DESC}]] [,...]]
The following table explains each expression listed in the preceding syntax.
Expression | Definition | Examples |
---|---|---|
|
A comma-separated list containing the following expressions:
NoteThere must be at least one |
|
|
One or more supported aggregation functions applied to one or more columns. Only columns are allowed as arguments of aggregation functions. NoteThere must be at least one |
|
|
An expression that can contain any expression using the following:
Note
|
|
|
A table, or join of tables, connecting join conditional expressions with
The
|
|
|
A conditional expression that returns a Boolean. It may be comprised of the following:
Supported comparison conditions are ( Supported logical operators are ( The |
|
|
A comma-separated list of expressions that match the requirements for the
|
|
|
A conditional expression that returns an Boolean. They have a supported
aggregation function applied to a single column (for example,
Supported conditions are ( Supported logical operators are ( The |
|
|
A comma-separated list of expressions that is compatible with the same
requirements defined in The Note
|
|
For aggregation query structure and syntax, be aware of the following:
-
SQL commands other than SELECT are not supported.
-
Sub-queries and common table expressions (for example, WITH) are not supported.
-
Operators that combine multiple queries (for example, UNION) are not supported.
-
TOP, LIMIT, and OFFSET parameters are not supported.
Aggregation analysis rule - query controls
With aggregation query controls, you can control how the columns in your table are used to query the table. For example, you can control which column is used for joining, which column can be counted, or which column can be used in WHERE statements.
The following sections explain each control.
Aggregation controls
By using aggregation controls, you can define which aggregation functions to allow, and what columns they must to be applied to. Aggregation functions can be used in the SELECT, HAVING, and ORDER BY expressions.
Control | Definition | Usage |
---|---|---|
aggregateColumns |
Columns of configured table columns you allow for use within aggregation functions. |
Some Given |
function |
The COUNT, SUM, and AVG functions you allow for use on top of the
aggregateColumns . |
|
Join controls
A JOIN
clause is used to combine rows from two or more tables, based on a
related column between them.
You can use Join controls to control how your table
can be joined to other tables in the table_expression
. AWS Clean Rooms only supports
INNER
JOIN. INNER
JOIN statements can only use columns that have been explicitly categorized as
a joinColumn
in your analysis rule, subject to the controls that you define.
The INNER
JOIN must operate on a joinColumn
from your configured table and
a joinColumn
from another configured table in the collaboration. You decide
which columns from your table can be used as joinColumn
.
Each match condition within the ON clause is required to use the equality
comparison condition (=
) between two columns.
Multiple match conditions within an ON clauses can be:
-
Combined using the
AND
logical operator -
Separated using the
OR
logical operator
Note
All JOIN match conditions must match one row from each side of the
JOIN. All conditionals connected by an OR
or an
AND
logical operator must adhere to this requirement as well.
The following is an example of a query with an AND
logical operator.
SELECT some_col, other_col FROM table1 JOIN table2 ON table1.id = table2.id AND table1.name = table2.name
The following is an example of a query with an OR
logical operator.
SELECT some_col, other_col FROM table1 JOIN table2 ON table1.id = table2.id OR table1.name = table2.name
Control | Definition | Usage |
---|---|---|
joinColumns |
The columns (if any) that you want to allow the member who can query to use in the INNER JOIN statement. |
A specific The same column can't be used both as Unless it has also been categorized as an |
joinRequired |
Control whether you require an INNER JOIN with a configured table from the member who can query. |
If you enable this parameter, an INNER JOIN is required. If you don't enable this parameter, an INNER JOIN is optional. Assuming you enable this parameter, the member who can query is required to include a table they own in the INNER JOIN. They must JOIN your table with theirs, either directly or transitively (that is, join their table to another table, which itself is joined to your table). |
Following is an example of transitivity.
ON my_table.identifer = third_party_table.identifier .... ON third_party_table.identifier = member_who_can_query_table.id
Note
The member who can query can also use the joinRequired
parameter. In that
case, the query must join their table with at least one other table.
Dimension controls
Dimension controls control the column along which the aggregation columns can be filtered, grouped, or aggregated.
Control | Definition | Usage |
---|---|---|
dimensionColumns |
The columns (if any) that you allow the member who can query to use in SELECT, WHERE, GROUP BY, and ORDER BY. |
A The same column can't be both a |
Scalar functions
Scalar functions control which scalar functions can be used on dimension columns.
Control | Definition | Usage |
---|---|---|
scalarFunctions |
The scalar functions that can be used on |
Specifies the scalar functions (if any) that you allow (for example,
CAST) to be applied on Scalar functions can't be used on top of other functions or within other functions. Arguments of scalar functions can be columns, string literals, or numeric literals. |
The following scalar functions are supported:
-
Math functions – ABS, CEILING, FLOOR, LOG, LN, ROUND, SQRT
-
Data type formatting functions – CAST, CONVERT, TO_CHAR, TO_DATE, TO_NUMBER, TO_TIMESTAMP
-
String functions – LOWER, UPPER, TRIM, RTRIM, SUBSTRING
-
For RTRIM, custom character sets to trim aren't allowed.
-
-
Conditional expressions – COALESCE
-
Date functions – EXTRACT, GETDATE, CURRENT_DATE, DATEADD
-
Other functions – TRUNC
For more details, see the AWS Clean Rooms SQL Reference.
Aggregation analysis rule - query results controls
With the aggregation query results controls, you can control which results are returned by
specifying one or more conditions that each output row must meet for it to be returned.
AWS Clean Rooms supports aggregation constraints in the form of COUNT (DISTINCT column) >=
X
. This form requires that each row aggregates at least X distinct values of a choice
from your configured table (for example, a minimum number of distinct user_id
values). This minimum threshold is automatically enforced, even if the submitted query itself
does not use the specified column. They are enforced collectively across each configured table
in the query from the configured tables from each member in the collaboration.
Each configured table must have at least one aggregation constraint in their analysis
rule. Configured table owners can add multiple columnName
and associated
minimum
and they are enforced collectively.
Aggregation constraints
Aggregation constraints control which rows in the query results are returned. To be returned, a row must meet the specified minimum number of distinct values in each column specified in the aggregation constraint. This requirement applies even if the column isn't explicitly mentioned in the query or in other parts of the analysis rule.
Control | Definition | Usage |
---|---|---|
columnName |
The |
Can be any column in the configured table. |
minimum |
The minimum number of distinct values for the associated
|
The |
Aggregation analysis rule structure
The following example shows a predefined structure for an aggregation analysis rule.
In the following example,
refers to your
data table. You can replace each MyTable
user input placeholder
with your
own information.
{ "aggregateColumns": [ { "columnNames": [
MyTable column names
], "function": [Allowed Agg Functions
] }, ], "joinRequired": ["QUERY_RUNNER"], "joinColumns": [MyTable column names
], "dimensionColumns": [MyTable column names
], "scalarFunctions": [Allowed Scalar functions
], "outputConstraints": [ { "columnName": [MyTable column names
], "minimum": [Numeric value
] }, ] }
Aggregation analysis rule - example
The following example demonstrates how two companies can collaborate in AWS Clean Rooms using aggregation analysis.
Company A has customer and sales data. Company A is interested in understanding product return activity. Company B is one of Company A's retailers and has returns data. Company B also has segment attributes on customers that are useful to Company A (for example, purchased related products, uses customer service from the retailer). Company B doesn't want to provide row-level customer return data and attribute information. Company B only wants to enable a set of queries for Company A to obtain aggregate statistics on overlapping customers at a minimum aggregation threshold.
Company A and Company B decide to collaborate so that Company A can understand product return activity and deliver better products at Company B and other channels.
To create the collaboration and run an aggregation analysis, the companies do the following:
-
Company A creates a collaboration and creates a membership. The collaboration has Company B as another member in the collaboration. Company A enables query logging in the collaboration, and it enables query logging in their account.
-
Company B creates a membership in the collaboration. It enables query logging in its account.
-
Company A creates a sales configured table.
-
Company A adds the following aggregation analysis rule to the sales configured table.
{ "aggregateColumns": [ { "columnNames": [ "identifier" ], "function": "COUNT_DISTINCT" }, { "columnNames": [ "purchases" ], "function": "AVG" }, { "columnNames": [ "purchases" ], "function": "SUM" } ], "joinColumns": [ "hashedemail" ], "dimensionColumns": [ "demoseg", "purchasedate", "productline" ], "scalarFunctions": [ "CAST", "COALESCE", "TRUNC" ], "outputConstraints": [ { "columnName": "hashedemail", "minimum": 2, "type": "COUNT_DISTINCT" }, ] }
aggregateColumns
– Company A wants to count the number of unique customers in the overlap between sales data and returns data. Company A also wants to sum the number ofpurchases
made to compare to number ofreturns
.joinColumns
– Company A wants to useidentifier
to match customers from sales data to customers from returns data. This will help Company A match returns to the right purchases. It also helps Company A segment overlapping customers.dimensionColumns
– Company A usesdimensionColumns
to filter by the specific product, compare purchases and returns over a certain period of time, ensure the return date is after the product date, and help segment overlapping customers.scalarFunctions
– Company A selectsCAST
scalar function to help update data type formats if needed based on the configured table Company A associates to the collaboration. It also adds scalar functions to help formatting columns if needed.outputConstraints
– Company A sets minimum output constraints. It doesn't need to constrain the results because the analyst is allowed to see row-level data from their sales tableNote
Company A doesn't include
joinRequired
in the analysis rule. It provides flexibility for their analyst to query the sales table alone. -
Company B creates a returns configured table.
-
Company B adds the following aggregation analysis rule to the returns configured table.
{ "aggregateColumns": [ { "columnNames": [ "identifier" ], "function": "COUNT_DISTINCT" }, { "columnNames": [ "returns" ], "function": "AVG" }, { "columnNames": [ "returns" ], "function": "SUM" } ], "joinColumns": [ "hashedemail" ], "joinRequired": [ "QUERY_RUNNER" ], "dimensionColumns": [ "state", "popularpurchases", "customerserviceuser", "productline", "returndate" ], "scalarFunctions": [ "CAST", "LOWER", "UPPER", "TRUNC" ], "outputConstraints": [ { "columnName": "hashedemail", "minimum": 100, "type": "COUNT_DISTINCT" }, { "columnName": "producttype", "minimum": 2, "type": "COUNT_DISTINCT" } ] }
aggregateColumns
– Company B enables Company A to sumreturns
to compare to the number of purchases. They have at least one aggregate column because they are enabling an aggregate query.joinColumns
– Company B enables Company A to join onidentifier
to match customers from return data to customers from sales data.identifier
data is particularly sensitive and having it as ajoinColumn
ensures that the data will never be outputted in a query.joinRequired
– Company B requires queries on the return data to be overlapped with the sales data. They don’t want to enable Company A to query all individuals in their dataset. They also agreed on that restriction in their collaboration agreement.dimensionColumns
– Company B enables Company A to filter and group bystate
,popularpurchases
, andcustomerserviceuser
which are unique attributes that could help make the analysis for Company A. Company B enables Company A to usereturndate
to filter output onreturndate
that occurs afterpurchasedate
. With this filtering, the output is more accurate for evaluating the impact of the product change.scalarFunctions
– Company B enables the following:-
TRUNC for dates
-
LOWER and UPPER in case the
producttype
is entered in a different format in their data -
CAST if Company A needs to convert data types in sales to be the same as data types in returns
Company A doesn't enable other scalar functions because they don’t believe they are required for queries.
outputConstraints
– Company B sets minimum output constraints onhashedemail
to help reduce the ability to re-identify customers. It also adds minimum output constraint onproducttype
to reduce the ability to re-identify specific products that were returned. Certain product types could be more dominant based on dimensions of the output (for example,state
). Their output constraints will always be enforced regardless of output constraints added by Company A to their data. -
-
Company A creates a sales table association to collaboration.
-
Company B creates a returns table association to collaboration.
-
Company A runs queries, such as the following example, to better understand the quantity of returns in Company B as compared to total purchases by location in 2022.
SELECT companyB.state, SUM(companyB.returns), COUNT(DISTINCT companyA.hashedemail) FROM sales companyA INNER JOIN returns companyB ON companyA.identifier = companyB.identifier WHERE companyA.purchasedate BETWEEN '2022-01-01' AND '2022-12-31' AND TRUNC(companyB.returndate) > companyA.purchasedate GROUP BY companyB.state;
-
Company A and Company B review query logs. Company B verifies that the query aligns with what was agreed upon in the collaboration agreement.
Troubleshooting aggregation analysis rule issues
Use the information here to help you diagnose and fix common issues when you work with aggregation analysis rules.
My query didn’t return any results
This can happen when there are no matching results or when the matching results don’t meet one or more minimum aggregation thresholds.
For more information about minimum aggregation thresholds, see Aggregation analysis rule - example.