Examples: Geospatial Queries - Amazon Athena

Examples: Geospatial Queries

The examples in this topic create two tables from sample data available on GitHub and query the tables based on the data. The sample data, which are for illustration purposes only and are not guaranteed to be accurate, are in the following files:

  • earthquakes.csv – Lists earthquakes that occurred in California. The example earthquakes table uses fields from this data.

  • california-counties.json – Lists county data for the state of California in ESRI-compliant GeoJSON format. The data includes many fields such as AREA, PERIMETER, STATE, COUNTY, and NAME, but the example counties table uses only two: Name (string), and BoundaryShape (binary).

    Note

    Athena uses the com.esri.json.hadoop.EnclosedJsonInputFormat to convert the JSON data to geospatial binary format.

The following code example creates a table called earthquakes:

CREATE external TABLE earthquakes ( earthquake_date string, latitude double, longitude double, depth double, magnitude double, magtype string, mbstations string, gap string, distance string, rms string, source string, eventid string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 's3://my-query-log/csv/';

The following code example creates a table called counties:

CREATE external TABLE IF NOT EXISTS counties ( Name string, BoundaryShape binary ) ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde' STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://my-query-log/json/';

The following example query uses the CROSS JOIN function on the counties and earthquake tables. The example uses ST_CONTAINS to query for counties whose boundaries include earthquake locations, which are specified with ST_POINT. The query groups such counties by name, orders them by count, and returns them in descending order. The query uses Athena engine version 2.

Note

In Athena engine version 2, functions like ST_CONTAINS no longer support the VARBINARY type as an input. For this reason, the example uses the ST_GeomFromLegacyBinary(varbinary) function to convert the boundaryshape binary value into a geometry. If you are using Athena engine version 1, you do not need to do this conversion. For more information, see Changes to Geospatial Functions in the Athena engine version 2 reference.

SELECT counties.name, COUNT(*) cnt FROM counties CROSS JOIN earthquakes WHERE ST_CONTAINS (ST_GeomFromLegacyBinary(counties.boundaryshape), ST_POINT(earthquakes.longitude, earthquakes.latitude)) GROUP BY counties.name ORDER BY cnt DESC

This query returns:

+------------------------+ | name | cnt | +------------------------+ | Kern | 36 | +------------------------+ | San Bernardino | 35 | +------------------------+ | Imperial | 28 | +------------------------+ | Inyo | 20 | +------------------------+ | Los Angeles | 18 | +------------------------+ | Riverside | 14 | +------------------------+ | Monterey | 14 | +------------------------+ | Santa Clara | 12 | +------------------------+ | San Benito | 11 | +------------------------+ | Fresno | 11 | +------------------------+ | San Diego | 7 | +------------------------+ | Santa Cruz | 5 | +------------------------+ | Ventura | 3 | +------------------------+ | San Luis Obispo | 3 | +------------------------+ | Orange | 2 | +------------------------+ | San Mateo | 1 | +------------------------+

Additional Resources

For additional examples of geospatial queries, see the following blog posts: