Amazon Data Firehose example

When you use Firehose to deliver data to Amazon S3, the default configuration writes objects with keys that look like the following example:


s3://amzn-s3-demo-bucket/prefix/yyyy/MM/dd/HH/file.extension

To create an Athena table that finds the partitions automatically at query time, instead of having to add them to the AWS Glue Data Catalog as new data arrives, you can use partition projection.

The following CREATE TABLE example uses the default Firehose configuration.



CREATE EXTERNAL TABLE my_ingested_data (
 ...
)
...
PARTITIONED BY (
 datehour STRING
)
LOCATION "s3://amzn-s3-demo-bucket/prefix/"
TBLPROPERTIES (
 "projection.enabled" = "true",
 "projection.datehour.type" = "date",
 "projection.datehour.format" = "yyyy/MM/dd/HH",
 "projection.datehour.range" = "2021/01/01/00,NOW",
 "projection.datehour.interval" = "1",
 "projection.datehour.interval.unit" = "HOURS",
 "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/${datehour}/"
)

The TBLPROPERTIES clause in the CREATE TABLE statement tells Athena the following:

Use partition projection when querying the table
The partition key datehour is of type date (which includes an optional time)
How the dates are formatted
The range of date times. Note that the values must be separated by a comma, not a hyphen.
Where to find the data on Amazon S3.

When you query the table, Athena calculates the values for datehour and uses the storage location template to generate a list of partition locations.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Use dynamic ID partitioning

How to use the date type