Menu
Amazon Athena
User Guide

JSON SerDe Libraries

In Athena, you can use two SerDe libraries for processing JSON files:

SerDe Names

Hive-JsonSerDe

Openx-JsonSerDe

Library Names

Use one of the following:

org.apache.hive.hcatalog.data.JsonSerDe

org.openx.data.jsonserde.JsonSerDe

Hive JSON SerDe

The Hive JSON SerDe is used to process JSON documents, most commonly events. These events are represented as blocks of JSON-encoded text separated by a new line.

You can also use the Hive JSON SerDe to parse more complex JSON documents with nested structures. However, this requires having a matching DDL representing the complex data types. See Example: Deserializing Nested JSON.

This SerDe has a useful property you can specify when creating tables in Athena, to help deal with inconsistencies in the data:

  • 'ignore.malformed.json' if set to TRUE, lets you skip malformed JSON syntax.

Note

You can query data in regions other than the region where you run Athena. Standard inter-region data transfer rates for Amazon S3 apply in addition to standard Athena charges. To reduce data transfer charges, replace myregion in s3://athena-examples-myregion/path/to/data/ with the region identifier where you run Athena, for example, s3://athena-examples-us-east-1/path/to/data/.

The following DDL statement uses the Hive JSON SerDe:

CREATE EXTERNAL TABLE impressions ( requestbegintime string, adid string, impressionid string, referrer string, useragent string, usercookie string, ip string, number string, processid string, browsercookie string, requestendtime string, timers struct < modellookup:string, requesttime:string >, threadid string, hostname string, sessionid string ) PARTITIONED BY (dt string) ROW FORMAT serde 'org.apache.hive.hcatalog.data.JsonSerDe' with serdeproperties ( 'paths'='requestbegintime, adid, impressionid, referrer, useragent, usercookie, ip' ) LOCATION 's3://myregion.elasticmapreduce/samples/hive-ads/tables/impressions';

OpenX JSON SerDe

The following DDL statement uses the OpenX JSON SerDe:

CREATE EXTERNAL TABLE impressions ( requestbegintime string, adid string, impressionId string, referrer string, useragent string, usercookie string, ip string, number string, processid string, browsercokie string, requestendtime string, timers struct< modellookup:string, requesttime:string>, threadid string, hostname string, sessionid string ) PARTITIONED BY (dt string) ROW FORMAT serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ( 'paths'='requestbegintime, adid, impressionid, referrer, useragent, usercookie, ip' ) LOCATION 's3://myregion.elasticmapreduce/samples/hive-ads/tables/impressions';

Example: Deserializing Nested JSON

JSON data can be challenging to deserialize when creating a table in Athena.

When dealing with complex nested JSON, there are common issues you may encounter. For more information about these issues and troubleshooting practices, see the AWS Knowledge Center Article I receive errors when I try to read JSON data in Amazon Athena.

For more information about common scenarios and query tips, see Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe.

The following example demonstrates a simple approach to creating an Athena table from a nested JSON file.To parse JSON-encoded files in Athena, each JSON document must be on its own line, separated by a new line.

This example presumes a JSON file with the following structure:

{ "DocId": "AWS", "User": { "Id": 1234, "Username": "bob1234", "Name": "Bob", "ShippingAddress": { "Address1": "123 Main St.", "Address2": null, "City": "Seattle", "State": "WA" }, "Orders": [ { "ItemId": 6789, "OrderDate": "11/11/2017" }, { "ItemId": 4352, "OrderDate": "12/12/2017" } ] } }

The following CREATE TABLE command uses the Openx-JsonSerDe with collection data types like struct and array to establish groups of objects. Each JSON document is on its own line, separated by a new line.

CREATE external TABLE complex_json ( docid string, `user` struct< id:INT, username:string, name:string, shippingaddress:struct< address1:string, address2:string, city:string, state:string >, orders:array< struct< itemid:INT, orderdate:string > > > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://mybucket/myjsondata/';