JSON SerDe libraries

In Athena, you can use SerDe libraries to deserialize JSON data. Deserialization converts the JSON data so that it can be serialized (written out) into a different format like Parquet or ORC.

Note

The Hive and OpenX libraries expect JSON data to be on a single line (not formatted), with records separated by a new line character.

Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike the Hive and OpenX JSON SerDe libraries, the Amazon Ion SerDe does not expect each row of data to be on a single line. This feature is useful if you want to query JSON datasets that are in "pretty print" format or otherwise break up the fields in a row with newline characters.

Library names

Use one of the following:

org.apache.hive.hcatalog.data.JsonSerDe

org.openx.data.jsonserde.JsonSerDe

com.amazon.ionhiveserde.IonHiveSerDe

Additional resources

For more information about working with JSON and nested JSON in Athena, see the following resources:

Create tables in Amazon Athena from nested JSON and mappings using JSONSerDe (AWS Big Data Blog)
I get errors when I try to read JSON data in Amazon Athena (AWS Knowledge Center article)
hive-json-schema (GitHub) – Tool written in Java that generates CREATE TABLE statements from example JSON documents. The CREATE TABLE statements that are generated use the OpenX JSON Serde.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Grok SerDe

Hive JSON SerDe