Amazon Athena
User Guide  | API Reference

Extracting Data from JSON

You may have source data with containing JSON-encoded strings that you do not necessarily want to deserialize into a table in Athena. In this case, you can still run SQL operations on this data, using the JSON functions available in Presto.

Consider this JSON string as an example dataset.

Examples: extracting properties#

To extract the name and projects properties from the JSON string, use the json_extract function as in the following example. The json_extract function takes the column containing the JSON string, and searches it using a JSONPath-like expression with the dot . notation.

Note

JSONPath performs a simple tree traversal. It uses the $ sign to denote the root of the JSON document, followed by a period and an element nested directly under the root, such as $.name.
WITH dataset AS (
  SELECT '{"name": "Susan Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},
           {"name":"project2", "completed":true}]}'
    AS blob
)
SELECT
  json_extract(blob, '$.name') AS name,
  json_extract(blob, '$.projects') AS projects
FROM dataset

The returned value is a JSON-encoded string, and not a native Athena data type.

+-----------------------------------------------------------------------------------------------+
| name           | projects                                                                     |
+-----------------------------------------------------------------------------------------------+
| "Susan Smith"  | [{"name":"project1","completed":false},{"name":"project2","completed":true}] |
+-----------------------------------------------------------------------------------------------+

To extract the scalar value from the JSON string, use the json_extract_scalar function. It is similar to json_extract, but returns only scalar values (Boolean, number, or string).

Note

Do not use the json_extract_scalar function on arrays, maps, or structs.
WITH dataset AS (
  SELECT '{"name": "Susan Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}'
    AS blob
)
SELECT
  json_extract_scalar(blob, '$.name') AS name,
  json_extract_scalar(blob, '$.projects') AS projects
FROM dataset

This query returns:

+---------------------------+
| name           | projects |
+---------------------------+
| Susan Smith    |          |
+---------------------------+

To obtain the first element of the projects property in the example array, use the json_array_get function and specify the index position.

WITH dataset AS (
  SELECT '{"name": "Bob Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}'
    AS blob
)
SELECT json_array_get(json_extract(blob, '$.projects'), 0) AS item
FROM dataset

It returns the value at the specified index position in the JSON-encoded array.

+---------------------------------------+
| item                                  |
+---------------------------------------+
| {"name":"project1","completed":false} |
+---------------------------------------+

To return an Athena string type, use the [] operator inside a JSONPath expression, then Use the json_extract_scalar function. For more information about [], see Accessing Array Elements.

WITH dataset AS (
   SELECT '{"name": "Bob Smith",
             "org": "engineering",
             "projects": [{"name":"project1", "completed":false},{"name":"project2", "completed":true}]}'
     AS blob
)
SELECT json_extract_scalar(blob, '$.projects[0].name') AS project_name
FROM dataset

It returns this result:

+--------------+
| project_name |
+--------------+
| project1     |
+--------------+