Supported types for partition projection
A table can have any combination of enum
, integer
,
date,
or injected
partition column types.
Enum type
Use the enum
type for partition columns whose values are members of
an enumerated set (for example, airport codes or AWS Regions).
Define the partition properties in the table as follows:
Property name | Example values | Description |
---|---|---|
projection. |
|
Required. The projection type to use for column
columnName . The value must be
enum (case insensitive) to signal the use of the
enum type. Leading and trailing white space is allowed. |
projection. |
|
Required. A comma-separated list of enumerated partition values
for column columnName . Any white space is
considered part of an enum value. |
Note
As a best practice we recommend limiting the use of enum
based
partition projections to a few dozen or less. Although there is no specific
limit for enum
projections, the total size of your table's metadata
cannot exceed the AWS Glue limit of about 1 MB when gzip compressed. Note that this
limit is shared across key parts of your table like column names, location,
storage format, and others. If you find yourself using more than a few dozen
unique IDs in your enum
projection, consider an alternative
approach such as bucketing into a smaller number of unique values in a surrogate
field. By trading off cardinality, you can control the number of unique values
in your enum
field.
Integer type
Use the integer type for partition columns whose possible values are interpretable as integers within a defined range. Projected integer columns are currently limited to the range of a Java signed long (-263 to 263-1 inclusive).
Property name | Example values | Description |
---|---|---|
projection. |
|
Required. The projection type to use for column
columnName . The value must be
integer (case insensitive) to signal the use of the
integer type. Leading and trailing white space is allowed. |
projection. |
|
Required. A two-element comma-separated list that provides the
minimum and maximum range values to be returned by queries on the
column columnName . Note that the values
must be separated by a comma, not a hyphen. These values are
inclusive, can be negative, and can have leading zeroes. Leading and
trailing white space is allowed. |
projection. |
|
Optional. A positive integer that specifies the interval between
successive partition values for the column
columnName . For example, a
range value of "1,3" with an interval
value of "1" produces the values 1, 2, and 3. The same
range value with an interval value of
"2" produces the values 1 and 3, skipping 2. Leading and trailing
white space is allowed. The default is 1. |
projection. |
|
Optional. A positive integer that specifies the number of digits
to include in the partition value's final representation for column
columnName . For example, a
range value of "1,3" that has a digits
value of "1" produces the values 1, 2, and 3. The same
range value with a digits value of "2"
produces the values 01, 02, and 03. Leading and trailing white space
is allowed. The default is no static number of digits and no leading
zeroes. |
Date type
Use the date type for partition columns whose values are interpretable as dates (with optional times) within a defined range.
Important
Projected date columns are generated in Coordinated Universal Time (UTC) at query execution time.
Property name | Example values | Description |
---|---|---|
projection. |
|
Required. The projection type to use for column
columnName . The value must be
date (case insensitive) to signal the use of the
date type. Leading and trailing white space is allowed. |
projection. |
|
Required. A two-element, comma-separated list which provides
the minimum and maximum This column can also contain relative date strings, formatted in this regular expression pattern:
White spaces are allowed, but in date literals are considered part of the date strings themselves. |
projection. |
|
Required. A date format string based on the Java date format
DateTimeFormatterJava.time.* type. |
projection. |
|
A positive integer that specifies the interval between
successive partition values for column
When the provided dates are at single-day or single-month
precision, the |
projection. |
|
A time unit word that represents the serialized form of a
ChronoUnit When the provided dates are at single-day or single-month
precision, the |
Example – Partitioning by month
The following example table configuration partitions data by month from 2015 to the present.
'projection.month.type'='date', 'projection.month.format'='yyyy-MM', 'projection.month.interval'='1', 'projection.month.interval.unit'='MONTHS', 'projection.month.range'='2015-01,NOW', ...
Injected type
Use the injected type for partition columns with possible values that cannot be
procedurally generated within some logical range but that are provided in a query's
WHERE
clause as a single value.
It is important to keep in mind the following points:
-
Queries on injected columns fail if a filter expression is not provided for each injected column.
-
Queries with multiple values for a filter expression on an injected column succeed only if the values are disjunct.
-
Only columns of
string
type are supported.
Property name | Value | Description |
---|---|---|
projection. |
|
Required. The projection type to use for the column
columnName . Only the
string type is supported. The value specified must
be injected (case insensitive). Leading and trailing
white space is allowed. |
For more information, see When to use the injected projection type.