Set up partition projection
Setting up partition projection in a table's properties is a two-step process:
-
Specify the data ranges and relevant patterns for each partition column, or use a custom template.
-
Enable partition projection for the table.
Note
Before you add partition projection properties to an existing table, the partition column for which you are setting up partition projection properties must already exist in the table schema. If the partition column does not yet exist, you must add a partition column to the existing table manually. AWS Glue does not perform this step for you automatically.
This section shows how to set the table properties for AWS Glue. To set them, you can use the AWS Glue console, Athena CREATE TABLE queries, or AWS Glue API operations. The following procedure shows how to set the properties in the AWS Glue console.
To configure and enable partition projection using the AWS Glue console
Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/
. -
Choose the Tables tab.
On the Tables tab, you can edit existing tables, or choose Add tables to create new ones. For information about adding tables manually or with a crawler, see Working with tables on the AWS Glue console in the AWS Glue Developer Guide.
-
In the list of tables, choose the link for the table that you want to edit.
-
Choose Actions, Edit table.
-
On the Edit table page, in the Table properties section, for each partitioned column, add the following key-value pair:
-
For Key, add
projection.
.columnName
.type -
For Value, add one of the supported types:
enum
,integer
,date
, orinjected
. For more information, see Supported types for partition projection.
-
-
Following the guidance in Supported types for partition projection, add additional key-value pairs according to your configuration requirements.
The following example table configuration configures the
year
column for partition projection, restricting the values that can be returned to a range from 2010 through 2016. -
Add a key-value pair to enable partition projection. For Key, enter
projection.enabled
, and for its Value, entertrue
.Note
You can disable partition projection on this table at any time by setting
projection.enabled
tofalse
. -
When you are finished, choose Save.
-
In the Athena Query Editor, test query the columns that you configured for the table.
The following example query uses
SELECT DISTINCT
to return the unique values from theyear
column. The database contains data from 1987 to 2016, but theprojection.year.range
property restricts the values returned to the years 2010 to 2016.Note
If you set
projection.enabled
totrue
but fail to configure one or more partition columns, you receive an error message like the following:HIVE_METASTORE_ERROR: Table
.database_name
.table_name
is configured for partition projection, but the following partition columns are missing projection configuration: [column_name
] (tabledatabase_name
.table_name
)
How to specify custom S3 storage locations
When you edit table properties in AWS Glue, you can also specify a custom Amazon S3 path
template for the projected partitions. A custom template enables Athena to properly
map partition values to custom Amazon S3 file locations that do not follow a typical
.../column=value/...
pattern.
Using a custom template is optional. However, if you use a custom template, the template must contain a placeholder for each partition column. Templated locations must end with a forward slash so that the partitioned data files live in a "folder" per partition.
To specify a custom partition location template
-
Following the steps to configure and enable partition projection using the AWS Glue console, add an additional a key-value pair that specifies a custom template as follows:
-
For Key, enter
storage.location.template
. -
For Value, specify a location that includes a placeholder for every partition column. Make sure that each placeholder (and the S3 path itself) is terminated by a single forward slash.
The following example template values assume a table with partition columns
a
,b
, andc
.s3://amzn-s3-demo-bucket/table_root/a=${a}/${b}/some_static_subdirectory/${c}/
s3://amzn-s3-demo-bucket/table_root/c=${c}/${b}/some_static_subdirectory/${a}/${b}/${c}/${c}/
For the same table, the following example template value is invalid because it contains no placeholder for column
c
.s3://amzn-s3-demo-bucket/table_root/a=${a}/${b}/some_static_subdirectory/
-
-
Choose Apply.