JSON SerDe libraries
In Athena, you can use SerDe libraries to deserialize JSON data. Deserialization converts the JSON data so that it can be serialized (written out) into a different format like Parquet or ORC.
Note
The Hive and OpenX libraries expect JSON data to be on a single line (not formatted), with records separated by a new line character.
Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike the Hive and OpenX JSON SerDe libraries, the Amazon Ion SerDe does not expect each row of data to be on a single line. This feature is useful if you want to query JSON datasets that are in "pretty print" format or otherwise break up the fields in a row with newline characters.
Library names
Use one of the following:
org.apache.hive.hcatalog.data.JsonSerDe
org.openx.data.jsonserde.JsonSerDe
com.amazon.ionhiveserde.IonHiveSerDe
Additional resources
For more information about working with JSON and nested JSON in Athena, see the following resources:
-
Create tables in Amazon Athena from nested JSON and mappings using JSONSerDe
(AWS Big Data Blog) -
I get errors when I try to read JSON data in Amazon Athena
(AWS Knowledge Center article) -
hive-json-schema
(GitHub) – Tool written in Java that generates CREATE TABLE
statements from example JSON documents. TheCREATE TABLE
statements that are generated use the OpenX JSON Serde.