JSON

There are two SerDes for JSON: the native Hive/HCatalog JsonSerDe and the OpenX SerDe.

Hive JSON SerDe#

The following DDL statement uses the Hive JsonSerDe:

CREATE EXTERNAL TABLE impressions (
    requestBeginTime string,
    adId string,
    impressionId string,
    referrer string,
    userAgent string,
    userCookie string,
    ip string,
    number string,
    processId string,
    browserCookie string,
    requestEndTime string,
    timers struct<modelLookup:string, requestTime:string>,
    threadId string, hostname string,
    sessionId string
)   PARTITIONED BY (dt string)
ROW FORMAT  serde 'org.apache.hive.hcatalog.data.JsonSerDe'
with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' )
LOCATION 's3://myregion.elasticmapreduce/samples/hive-ads/tables/impressions';

OpenX JSON SerDe#

The following DDL statement uses the OpenX SerDe:

CREATE EXTERNAL TABLE impressions (
    requestBeginTime string,
    adId string,
    impressionId string,
    referrer string,
    userAgent string,
    userCookie string,
    ip string,
    number string,
    processId string,
    browserCookie string,
    requestEndTime string,
    timers struct<modelLookup:string, requestTime:string>,
    threadId string, hostname string,
    sessionId string
)   PARTITIONED BY (dt string)
ROW FORMAT  serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' )
LOCATION 's3://myregion.elasticmapreduce/samples/hive-ads/tables/impressions';

Example: Nested JSON#

JSON data can be challenging to deserialize when creating a table in Athena. When dealing with complex nested JSON there are common issues you may encounter. For more information about these issues and troubleshooting practices, see the AWS Knowledge Center Article. For a detailed walkthrough of common scenarios and query tips, see Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe.

The following example demonstrates a simple approach to creating an Athena table from a nested JSON file. This example presumes a JSON file with the following structure:

{
   "DocId": "AWS",
   "User": {
         "Id": 1234,
         "Username": "bob1234",
         "Name": "Bob",
         "ShippingAddress": {
         "Address1": "123 Main St.",
         "Address2": null,
         "City": "Seattle",
         "State": "WA"
         },
   "Orders": [
         {
            "ItemId": 6789,
            "OrderDate": "11/11/2012"
         },
         {
            "ItemId": 4352,
            "OrderDate": "12/12/2012"
         }
      ]
   }
}

The following CREATE TABLE command uses the Openx-JsonSerDe with collection data types like struct and array to establish groups of objects.

CREATE external TABLE complex_json (
   DocId string,
   USER struct<Id:INT,
               Username:string,
               Name:string,
               ShippingAddress:struct<Address1:string,
                                      Address2:string,
                                      City:string,
                                      State:string>,
               Orders:array<struct<ItemId:INT,
                                   OrderDate:string>>>
   )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/myjsondata/';