Amazon Athena
User Guide  | API Reference

Defining a SerDe

A SerDe (Serializer/Deserializer) is a way in which Apache Hive interfaces with your data. Athena supports many different SerDe libraries for parsing data from different data formats, such as CSV, JSON, Parquet, and ORC. Athena does not support custom SerDes.

It is the SerDe you specify, and not the Hive DDL, that defines the table schema. In other words, the SerDe can override the Hive DDL configuration you specify in Athena when you create your table.

To define a SerDe#

To define a SerDe when creating a table in Athena, use one of these methods:

  • Use Hive DDL statements to describe how to read and write data to the table and do not specify a ROW FORMAT, as in this example. This omits listing the actual SerDe type and the native LazySimpleSerDe is used by default.
In general, Athena uses the LazySimpleSerDe if you do not specify a ROW FORMAT, or if you specify ROW FORMAT DELIMITED.
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
ESCAPED BY '\\'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
  • Explicitly specify the type of SerDe Athena should use when it reads and writes data to the table. Also, specify additional properties in SERDEPROPERTIES, as in this example.
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ',',
'colelction.delim' = '|',
'mapkey.delim' = ':',
'escape.delim' = '\\'
)

On this page: