Data type examples Considerations for data types Working with timestamp data

Data types in Amazon Athena

When you run CREATE TABLE, you specify column names and the data type that each column can contain. The tables that you create are stored in the AWS Glue Data Catalog.

To facilitate interoperability with other query engines, Athena uses Apache Hive data type names for DDL statements like CREATE TABLE. For DML queries like SELECT, CTAS, and INSERT INTO, Athena uses Trino data type names. The following table shows the data types supported in Athena. Where DDL and DML types differ in terms of name, availability, or syntax, they are shown in separate columns.

DDL	DML	Description
BOOLEAN		Values are `true` and `false`.
TINYINT		An 8-bit signed integer in two's complement format, with a minimum value of -2⁷ and a maximum value of 2⁷-1.
SMALLINT		A 16-bit signed integer in two's complement format, with a minimum value of -2¹⁵ and a maximum value of 2¹⁵-1.
INT, INTEGER		A 32-bit signed value in two's complement format, with a minimum value of -2³¹ and a maximum value of 2³¹-1.
BIGINT		A 64-bit signed integer in two's complement format, with a minimum value of -2⁶³ and a maximum value of 2⁶³-1.
FLOAT	REAL	A 32-bit signed single-precision floating point number. The range is 1.40129846432481707e-45 to 3.40282346638528860e+38, positive or negative. Follows the IEEE Standard for Floating-Point Arithmetic (IEEE 754).
DOUBLE		A 64-bit signed double-precision floating point number. The range is 4.94065645841246544e-324d to 1.79769313486231570e+308d, positive or negative. Follows the IEEE Standard for Floating-Point Arithmetic (IEEE 754).
DECIMAL(`precision`, `scale`)		`precision` is the total number of digits. `scale` (optional) is the number of digits in fractional part with a default of 0. For example, use these type definitions: `decimal(11,5)`, `decimal(15)`. The maximum value for `precision` is 38, and the maximum value for `scale` is 38.
CHAR, CHAR(`length`)		Fixed length character data, with a specified length between 1 and 255, such as char(10). If `length` is specified, strings are truncated at the specified length when read. If the underlying data string is longer, the underlying data string remains unchanged. For more information, see CHAR Hive data type.
STRING	VARCHAR	Variable length character data.
VARCHAR(`length`)		Variable length character data with a maximum read length. Strings are truncated at the specified length when read. If the underlying data string is longer, the underlying data string remains unchanged.
BINARY	VARBINARY	Variable length binary data.
TIME		A time of day with millisecond precision.
Not available	TIME(`precision`)	A time of day with a specific precision. `TIME(3)` is equivalent to `TIME`.
Not available	TIME WITH TIME ZONE	A time of day in a time zone. Time zones should be specified as offsets from UTC.
DATE		A calendar date with year, month, and day.
TIMESTAMP	TIMESTAMP, TIMESTAMP WITHOUT TIME ZONE	A calendar date and time of day with millisecond precision.
Not available	TIMESTAMP(`precision`), TIMESTAMP(`precision`) WITHOUT TIME ZONE	A calendar date and time of day with a specific precision. `TIMESTAMP(3)` is equivalent to `TIMESTAMP`.
Not available	TIMESTAMP WITH TIME ZONE	A calendar date and time of day in a time zone. Time zones can be specified as offsets from UTC, as IANA time zone names, or using UTC, UT, Z, or GMT.
Not available	TIMESTAMP(`precision`) WITH TIME ZONE	A calendar date and time of day with a specific precision, in a time zone.
Not available	INTERVAL YEAR TO MONTH	An interval of one or more whole months
Not available	INTERVAL DAY TO SECOND	An interval of one or more seconds, minutes, hours, or days
ARRAY<`element_type`>	ARRAY(`element_type`)	An array of values. All values must have the same type.
MAP<`key_type`, `value_type`>	MAP(`key_type`, `value_type`)	A map where values can be looked up by key. All keys must have the same value, and all values must have the same value.
STRUCT<`field_name_1`:`field_type_1`, `field_name_2`:`field_type_2`, …>	ROW(`field_name_1` `field_type_1`, `field_name_2` `field_type_2`, …)	A data structure with named fields and their values.
Not available	JSON	JSON value type, which can be a JSON object, a JSON array, a JSON number, a JSON string, `true`, `false` or `null`.
Not available	UUID	A UUID (Universally Unique IDentifier).
Not available	IPADDRESS	An IPv4 or IPv6 address.
Not available	HyperLogLog	These data types support approximate function internals. For more information about each type, visit the link to the corresponding entry in the Trino documentation.
	P4HyperLogLog
	SetDigest
	QDigest
	TDigest

Data type examples

The following table shows example literals for DML data types.

Data type	Examples
BOOLEAN	`true` `false`
TINYINT	`TINYINT '123'`
SMALLINT	`SMALLINT '123'`
INT, INTEGER	`123456790`
BIGINT	`BIGINT '1234567890'` `2147483648`
REAL	`'123456.78'`
DOUBLE	`1.234`
DECIMAL(`precision`, `scale`)	`DECIMAL '123.456'`
CHAR, CHAR(`length`)	`CHAR 'hello world'`, `CHAR 'hello ''world''!'`
VARCHAR, VARCHAR(`length`)	`VARCHAR 'hello world'`, `VARCHAR 'hello ''world''!'`
VARBINARY	`X'00 01 02'`
TIME, TIME(`precision`)	`TIME '10:11:12'`, `TIME '10:11:12.345'`
TIME WITH TIME ZONE	`TIME '10:11:12.345 -06:00'`
DATE	`DATE '2024-03-25'`
TIMESTAMP, TIMESTAMP WITHOUT TIME ZONE, TIMESTAMP(`precision`), TIMESTAMP(`precision`) WITHOUT TIME ZONE	`TIMESTAMP '2024-03-25 11:12:13'`, `TIMESTAMP '2024-03-25 11:12:13.456'`
TIMESTAMP WITH TIME ZONE, TIMESTAMP(`precision`) WITH TIME ZONE	`TIMESTAMP '2024-03-25 11:12:13.456 Europe/Berlin'`
INTERVAL YEAR TO MONTH	`INTERVAL '3' MONTH`
INTERVAL DAY TO SECOND	`INTERVAL '2' DAY`
ARRAY[`element_type`]	`ARRAY['one', 'two', 'three']`
MAP(`key_type`, `value_type`)	`MAP(ARRAY['one', 'two', 'three'], ARRAY[1, 2, 3])` Note that maps are created from an array of keys and an array of values.
ROW(`field_name_1` `field_type_1`, `field_name_2` `field_type_2`, …)	`ROW('one', 'two', 'three')` Note that rows created this way have no column names. To add column names, you can use `CAST`, as in the following example: `CAST(ROW(1, 2, 3) AS ROW(one INT, two INT, three INT))`
JSON	`JSON '{"one":1, "two": 2, "three": 3}'`
UUID	`UUID '12345678-90ab-cdef-1234-567890abcdef'`
IPADDRESS	`IPADDRESS '10.0.0.1'` `IPADDRESS '2001:db8::1'`

Considerations for data types

CHAR and VARCHAR

A CHAR(n) value always has a count of n characters. For example, if you cast 'abc' to CHAR(7), 4 trailing spaces are added.

Comparisons of CHAR values include leading and trailing spaces.

If a length is specified for CHAR or VARCHAR, strings are truncated at the specified length when read. If the underlying data string is longer, the underlying data string remains unchanged.

To escape a single quote in a CHAR or VARCHAR, use an additional single quote.

To cast a non-string data type to a string in a DML query, cast to the VARCHAR data type.

To use the substr function to return a substring of specified length from a CHAR data type, you must first cast the CHAR value as a VARCHAR. In the following example, col1 uses the CHAR data type.


substr(CAST(col1 AS VARCHAR), 1, 4)

DECIMAL

To specify decimal values as literals in SELECT queries, such as when selecting rows with a specific decimal value, you can specify the DECIMAL type and list the decimal value as a literal in single quotes in your query, as in the following examples.


SELECT * FROM my_table
WHERE decimal_value = DECIMAL '0.12'


SELECT DECIMAL '44.6' + DECIMAL '77.2'

Working with timestamp data

This section describes some considerations for working with timestamp data in Athena.

Note

The treatment of timestamps has changed somewhat between Athena engine version 2 and Athena engine version 3. For information about timestamp-related errors that can occur in Athena engine version 3 and suggested solutions, see Timestamp changes in the Athena engine version 3 reference.

Format for writing timestamp data to Amazon S3 objects

The format in which timestamp data should be written into Amazon S3 objects depends on both the column data type and the SerDe library that you use.

If you have a table column of type DATE, Athena expects the corresponding column or property of the data to be a string in the ISO format YYYY-MM-DD, or a built-in date type like those for Parquet or ORC.
If you have a table column of type TIME, Athena expects the corresponding column or property of the data to be a string in the ISO format HH:MM:SS, or a built-in time type like those for Parquet or ORC.
If you have a table column of type TIMESTAMP, Athena expects the corresponding column or property of the data to be a string in the format YYYY-MM-DD HH:MM:SS.SSS (note the space between the date and time), or a built-in time type like those for Parquet, ORC, or Ion.

Note
OpenCSVSerDe timestamps are an exception and must be encoded as millisecond resolution UNIX epochs.

Ensuring that time-partitioned data matches the timestamp field in a record

The producer of the data must make sure partition values align with the data within the partition. For example, if your data has a timestamp property and you use Firehose to load the data into Amazon S3, you must use dynamic partitioning because the default partitioning of Firehose is wall-clock-based.

Use string as the data type for partition keys

For performance reasons, it is preferable to use STRING as the data type for partition keys. Even though Athena recognizes partition values in the format YYYY-MM-DD as dates when you use the DATE type, this can lead to poor performance. For this reason, we recommend that you use the STRING data type for partition keys instead.

How to write queries for timestamp fields that are also time-partitioned

How you write queries for timestamp fields that are time-partitioned depends on the type of table that you want to query.

Hive tables

With the Hive tables most commonly used in Athena, the query engine has no knowledge of relationships between columns and partition keys. For this reason, you must always add predicates in your queries for both the column and the partition key.

For example, suppose you have an event_time column and an event_date partition key and want to query events between 23:00 and 03:00. In this case, you must include predicates in your query for both the column and the partition key, as in the following example.


WHERE event_time BETWEEN start_time AND end_time 
  AND event_date BETWEEN start_time_date AND end_time_date

Iceberg tables

With Iceberg tables, you can use computed partition values, which simplifies your queries. For example, suppose your Iceberg table was created with a PARTITIONED BY clause like the following:


PARTITIONED BY (event_date month(event_time))

In this case, the query engine automatically prunes partitions based on the values of the event_time predicates. Because of this, your query only needs to specify a predicate for event_time, as in the following example.


WHERE event_time BETWEEN start_time AND end_time

For more information, see Creating Iceberg tables.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

SQL reference for Athena

DML queries, functions, and operators