JSON SerDe libraries
In Athena, you can use SerDe libraries to deserialize JSON data. Deserialization converts the JSON data so that it can be serialized (written out) into a different format like Parquet or ORC.
-
The native Hive JSON SerDe
-
The OpenX JSON SerDe
Note
The Hive and OpenX libraries expect JSON data to be on a single line (not formatted), with records separated by a new line character. The Amazon Ion Hive SerDe does not have that requirement and can be used as an alternative because the Ion data format is a superset of JSON.
Library names
Use one of the following:
org.apache.hive.hcatalog.data.JsonSerDe
org.openx.data.jsonserde.JsonSerDe
com.amazon.ionhiveserde.IonHiveSerDe
Additional resources
For more information about working with JSON and nested JSON in Athena, see the following resources:
-
Create tables in Amazon Athena from nested JSON and mappings using JSONSerDe
(AWS Big Data Blog) -
I get errors when I try to read JSON data in Amazon Athena
(AWS Knowledge Center article) -
hive-json-schema
(GitHub) – Tool written in Java that generates CREATE TABLE
statements from example JSON documents. TheCREATE TABLE
statements that are generated use the OpenX JSON Serde.