Nested data limitations (preview)
This topic describes limitations for reading nested data with Redshift Spectrum. Nested data is data that contains nested fields. Nested fields are fields that are joined together as a single entity, such as arrays, structs, or objects.
Note
The limitations marked (preview) in the following list only apply to preview clusters and preview workgroups created in the following Regions.
US East (Ohio) (us-east-2)
US East (N. Virginia) (us-east-1)
US West (N. California) (us-west-1)
Asia Pacific (Tokyo) (ap-northeast-1)
Europe (Ireland) (eu-west-1)
Europe (Stockholm) (eu-north-1)
For information about setting up Preview clusters, see Creating a preview cluster in the Amazon Redshift Management Guide. For information about setting up Preview workgroups, see Creating a preview workgroup in the Amazon Redshift Management Guide.
The following limitations apply to nested data:
-
An
array
ormap
type can contain otherarray
ormap
types as long as queries on the nestedarrays
ormaps
don't returnscalar
values. (preview) -
Amazon Redshift Spectrum supports complex data types only as external tables.
-
Subquery result columns must be top-level. (preview)
-
If an
OUTER JOIN
expression refers to a nested table, it can refer only to that table and its nested arrays (and maps). If anOUTER JOIN
expression doesn't refer to a nested table, it can refer to any number of non-nested tables. -
If a
FROM
clause in a subquery refers to a nested table, it can't refer to any other table. -
If a subquery depends on a nested table that refers to a parent table, the subquery can only use the parent table in the
FROM
clause. You can't use the parent in any other clauses, such as aSELECT
orWHERE
clause. For example, the following query doesn't run because the subquery'sSELECT
clause refers to the parent tablec
.SELECT c.name.given FROM spectrum.customers c WHERE (SELECT COUNT(c.id) FROM c.phones p WHERE p LIKE '858%') > 1;
The following query works because the parent
c
is used only in theFROM
clause of the subquery.SELECT c.name.given FROM spectrum.customers c WHERE (SELECT COUNT(*) FROM c.phones p WHERE p LIKE '858%') > 1;
-
A subquery that accesses nested data anywhere other than the
FROM
clause must return a single value. The only exceptions are(NOT) EXISTS
operators in aWHERE
clause. -
(NOT) IN
is not supported. -
The maximum nesting depth for all nested types is 100. This restriction applies to all file formats (Parquet, ORC, Ion, and JSON).
-
Aggregation subqueries that access nested data can only refer to
arrays
andmaps
in theirFROM
clause, not to an external table. -
Querying the pseudocolumns of nested data in a Redshift Spectrum table is not supported. For more information, see Pseudocolumns.
-
When extracting data from array or map columns by specifying them in a
FROM
clause, you can only select values from those columns if the values arescalar
. For example, the following queries both try toSELECT
elements from inside an array. The query that selectsarr.a
works becausearr.a
is ascalar
value. The second query doesn't work becausearray
is an array extracted froms3.nested table
in theFROM
clause. (preview)SELECT array_column FROM s3.nested_table;
array_column ----------------- [{"a":1},{"b":2}]
SELECT arr.a FROM s3.nested_table t, t.array_column arr;arr.a ----- 1
--This query fails to run. SELECT array FROM s3.nested_table tab, tab.array_column array;You can’t use an array or map in the
FROM
clause that itself comes from another array or map. To select arrays or other complex structures that are nested inside other arrays, consider using indexes in theSELECT
statement.