Flattening nested arrays - Amazon Athena

Flattening nested arrays

When working with nested arrays, you often need to expand nested array elements into a single array, or expand the array into multiple rows.

Examples

To flatten a nested array's elements into a single array of values, use the flatten function. This query returns a row for each element in the array.

SELECT flatten(ARRAY[ ARRAY[1,2], ARRAY[3,4] ]) AS items

This query returns:

+-----------+ | items | +-----------+ | [1,2,3,4] | +-----------+

To flatten an array into multiple rows, use CROSS JOIN in conjunction with the UNNEST operator, as in this example:

WITH dataset AS ( SELECT 'engineering' as department, ARRAY['Sharon', 'John', 'Bob', 'Sally'] as users ) SELECT department, names FROM dataset CROSS JOIN UNNEST(users) as t(names)

This query returns:

+----------------------+ | department | names | +----------------------+ | engineering | Sharon | +----------------------| | engineering | John | +----------------------| | engineering | Bob | +----------------------| | engineering | Sally | +----------------------+

To flatten an array of key-value pairs, transpose selected keys into columns, as in this example:

WITH dataset AS ( SELECT 'engineering' as department, ARRAY[ MAP(ARRAY['first', 'last', 'age'],ARRAY['Bob', 'Smith', '40']), MAP(ARRAY['first', 'last', 'age'],ARRAY['Jane', 'Doe', '30']), MAP(ARRAY['first', 'last', 'age'],ARRAY['Billy', 'Smith', '8']) ] AS people ) SELECT names['first'] AS first_name, names['last'] AS last_name, department FROM dataset CROSS JOIN UNNEST(people) AS t(names)

This query returns:

+--------------------------------------+ | first_name | last_name | department | +--------------------------------------+ | Bob | Smith | engineering | | Jane | Doe | engineering | | Billy | Smith | engineering | +--------------------------------------+

From a list of employees, select the employee with the highest combined scores. UNNEST can be used in the FROM clause without a preceding CROSS JOIN as it is the default join operator and therefore implied.

WITH dataset AS ( SELECT ARRAY[ CAST(ROW('Sally', 'engineering', ARRAY[1,2,3,4]) AS ROW(name VARCHAR, department VARCHAR, scores ARRAY(INTEGER))), CAST(ROW('John', 'finance', ARRAY[7,8,9]) AS ROW(name VARCHAR, department VARCHAR, scores ARRAY(INTEGER))), CAST(ROW('Amy', 'devops', ARRAY[12,13,14,15]) AS ROW(name VARCHAR, department VARCHAR, scores ARRAY(INTEGER))) ] AS users ), users AS ( SELECT person, score FROM dataset, UNNEST(dataset.users) AS t(person), UNNEST(person.scores) AS t(score) ) SELECT person.name, person.department, SUM(score) AS total_score FROM users GROUP BY (person.name, person.department) ORDER BY (total_score) DESC LIMIT 1

This query returns:

+---------------------------------+ | name | department | total_score | +---------------------------------+ | Amy | devops | 54 | +---------------------------------+

From a list of employees, select the employee with the highest individual score.

WITH dataset AS ( SELECT ARRAY[ CAST(ROW('Sally', 'engineering', ARRAY[1,2,3,4]) AS ROW(name VARCHAR, department VARCHAR, scores ARRAY(INTEGER))), CAST(ROW('John', 'finance', ARRAY[7,8,9]) AS ROW(name VARCHAR, department VARCHAR, scores ARRAY(INTEGER))), CAST(ROW('Amy', 'devops', ARRAY[12,13,14,15]) AS ROW(name VARCHAR, department VARCHAR, scores ARRAY(INTEGER))) ] AS users ), users AS ( SELECT person, score FROM dataset, UNNEST(dataset.users) AS t(person), UNNEST(person.scores) AS t(score) ) SELECT person.name, score FROM users ORDER BY (score) DESC LIMIT 1

This query returns:

+--------------+ | name | score | +--------------+ | Amy | 15 | +--------------+

Considerations and limitations

If UNNEST is used on one or more arrays in the query, and one of the arrays is NULL, the query returns no rows. If UNNEST is used on an array that is an empty string, the empty string is returned.

For example, in the following query, because the second array is null, the query returns no rows.

SELECT col1, col2 FROM UNNEST (ARRAY ['apples','oranges','lemons']) AS t(col1) CROSS JOIN UNNEST (ARRAY []) AS t(col2)

In this next example, the second array is modified to contain an empty string. For each row, the query returns the value in col1 and an empty string for the value in col2. The empty string in the second array is required in order for the values in the first array to be returned.

SELECT col1, col2 FROM UNNEST (ARRAY ['apples','oranges','lemons']) AS t(col1) CROSS JOIN UNNEST (ARRAY ['']) AS t(col2)