Amazon Aurora PostgreSQL user-defined functions for Amazon Location Service - Amazon Location Service

Amazon Aurora PostgreSQL user-defined functions for Amazon Location Service

You can use Amazon Location Service to work with coordinates and addresses stored in database tables to validate and to clean and enrich your geospatial data.

For example:

  • You can use geocoding to convert addresses to coordinates to normalize and fill gaps in data for addresses stored in a database table.

  • You can geocode addresses to obtain their positions and use the coordinates with database spatial functions, such as a function that shows rows in a specified area.

  • You can use enriched data to generate automated reporting, such a generating an automated report that illustrates all devices in a given area, or an automated report for machine learning that illustrates areas with higher failure rates when sending location updates.

This tutorial provides a walk-through for how to validate and format addresses stored in an Amazon Aurora PostgreSQL database table using Amazon Location Service.

  • Amazon Aurora PostgreSQL – A fully managed relational database engine, compatible with MySQL and PostgreSQL, that outputs up to five times the throughput of MySQL and up to three times the throughput of PostgreSQL without changing most of your existing application. For more information, see What is Amazon Aurora? in the Amazon Aurora User Guide.

Important

The resulting application in this tutorial uses a place index that stores geocoding results charged under the Request-Based Usage pricing plan. For more information on applicable charges for storing geocoding results, see Amazon Location Service pricing.

Sample code is available in the Amazon Location Service samples repository on GitHub, which includes an AWS CloudFormation template.

Overview

The architecture involves the following integrations:

  • This solution uses an Amazon Location place index resource to support geocoding queries using the operation SearchPlaceIndexForText.

  • AWS Lambda uses a Python Lambda that geocodes addresses when an IAM policy gives permission to allow AWS Lambda to call the Amazon Location geocoding operation, SearchPlaceIndexForText.

  • Grant permission to Amazon Aurora PostgreSQL to invoke the geocoding Lambda function using an SQL user-defined function.

Prerequisites

Before you begin, you need the following prerequisites:

  • An Amazon Aurora PostgreSQL cluster. For more information about Creating an Amazon Aurora DB cluster, see the Amazon Aurora User Guide.

    Note

    If your Amazon Aurora cluster isn't publicly available, you must also configure Amazon Aurora to connect to AWS Lambda in a virtual private cloud (VPC) in your AWS account. For more information, see Grant Amazon Aurora PostgreSQL access to AWS Lambda.

  • An SQL developer tool to connect to the Amazon Aurora PostgreSQL cluster.

Quick start

As an alternative to going through the steps in this tutorial, you can launch a quick stack to deploy an AWS Lambda function supporting the Amazon Location operation SearchPlaceIndexForText. This automatically configures your AWS account to allow Amazon Aurora to call AWS Lambda.

Once you configure your AWS account, you will need to:

Create a place index resource

Start by creating a place index resource to support geocoding queries.

  1. Open the Amazon Location Service console at https://console.aws.amazon.com/location/.

  2. In the left navigation pane, choose Place indexes.

  3. Fill out the following boxes:

    • Name – Enter a name for the place index resource. For example, AuroraPlaceIndex. Maximum 100 characters. Valid entries include alphanumeric characters, hyphens, periods, and underscores.

    • Description – Enter an optional description. For example, Place index for Amazon Aurora.

  4. Under Data providers, choose an available data provider to use with your place index resource. If you have no preference, we recommend starting with Esri.

  5. Under Data storage options, specify Yes, results will be stored. This indicates that you intend to save the geocoding results in a database.

  6. Under Pricing Plan, the price plan shown is the Request-Based Usage pricing plan. For more information on applicable charges for storing geocoding results, see Amazon Location Service pricing.

  7. (Optional) Under Tags, enter a tag Key and Value. This adds a tag your new place index resource. For more information, see Tagging your resources.

  8. Choose Create place index.

Create an AWS Lambda function for geocoding

To create a connection between Amazon Aurora PostgreSQL and Amazon Location Service, you need an AWS Lambda function to handle requests from the database engine. This function translates the Lambda user-defined function event and calls the Amazon Location operation SearchPlaceIndexForText.

You can create the function using the AWS Lambda console, the AWS Command Line Interface, or the AWS Lambda APIs.

To create a Lambda user-defined function using the console

  1. Open the AWS Lambda console at https://console.aws.amazon.com/lambda/.

  2. From the left navigation, choose Functions.

  3. Choose Create Function, and make sure that Author from scratch is selected.

  4. Fill out the following boxes:

    • Function name – Enter a unique name for your function. Valid entries include alphanumeric characters, hyphens, and underscores with no spaces. For example, AuroraGeocoder.

    • Runtime – Choose Python 3.8.

  5. Choose Create function.

  6. Choose the Code tab to open the editor.

  7. Overwrite the placeholder code in lambda_function.py with the following:

    from os import environ import boto3 from botocore.config import Config # load the place index name from the environment, falling back to a default PLACE_INDEX_NAME = environ.get("PLACE_INDEX_NAME", "AuroraPlaceIndex") location = boto3.client("location", config=Config(user_agent="Amazon Aurora PostgreSQL")) """ This Lambda function receives a payload from Amazon Aurora and translates it to an Amazon Location `SearchPlaceIndex` call and returns the results as-is, to be post-processed by a PL/pgSQL function. """ def lambda_handler(event, context): kwargs = {} if event.get("biasPosition") is not None: kwargs["BiasPosition"] = event["biasPosition"] if event.get("filterBBox") is not None: kwargs["FilterBBox"] = event["filterBBox"] if event.get("filterCountries") is not None: kwargs["FilterCountries"] = event["filterCountries"] if event.get("maxResults") is not None: kwargs["MaxResults"] = event["maxResults"] return location.search_place_index_for_text( IndexName=PLACE_INDEX_NAME, Text=event["text"], **kwargs)["Results"]
  8. If you've named your place index something other than AuroraPlaceIndex, create an environment variable named PLACE_INDEX_NAME to assign the resource name to:

    • From the Configuration tab, choose Environment Variables.

    • Choose Edit, then choose Add environment variable.

    • For Key: Enter PLACE_INDEX_NAME.

    • For Value: Enter the name of your place index resource.

  9. Choose Deploy to save the updated function.

  10. From the Test drop-down menu, choose Configure test Event.

  11. Choose Create new test event.

  12. Enter the following test event:

    { "text": "Baker Beach", "biasPosition": [-122.483, 37.790], "filterCountries": ["USA"] }
  13. Choose Test to test the Lambda function.

  14. Choose the Configuration tab.

  15. Under General configuration: Choose Permissions.

  16. Under Execution role: Choose the hyper linked Role name to grant Amazon Location Service permissions to your Lambda function.

  17. Under the Permissions tab: Choose Add inline policy.

  18. Choose the JSON tab.

  19. Add the following IAM policy:

    • The following policy gives permission to send SearchPlaceIndexForText to the place index resource AuroraPlaceIndex.

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "geo:SearchPlaceIndexForText", "Resource": "arn:aws:geo:<Region>:<AccountId>:place-index/AuroraPlaceIndex" } ] }
  20. Choose Review policy.

  21. Enter a policy name. For example, AuroraPlaceIndexReadOnly.

  22. Choose Create policy.

Grant Amazon Aurora PostgreSQL access to AWS Lambda

Before Amazon Aurora PostgreSQL can invoke an AWS Lambda function, you must grant access permission.

If your Amazon Aurora PostgreSQL cluster isn't publicly accessible, you will need to first create a VPC endpoint for AWS Lambda in order for Amazon Aurora to call your Lambda function.

Create a VPC Endpoint for AWS Lambda

Note

This step is only required if your Amazon Aurora PostgreSQL cluster isn't publicly accessible.

  1. Open the Amazon Virtual Private Cloud Console.

  2. In the left navigation, choose Endpoints.

  3. Choose Create endpoint.

  4. In the Service Name filter, enter "lambda", then choose com.amazonaws.<region>.lambda.

  5. Choose the VPC containing your Aurora cluster.

  6. Choose a subnet for each availability zone.

  7. In the Security group filter, enter "default" or the name of the security group your Aurora cluster is a member of, then choose the security group.

  8. Choose Create endpoint.

Create an IAM policy to grant permission to invoke your AWS Lambda function

  1. Open the IAM console.

  2. In the left navigation, expand Access Management to choose Policies.

  3. Choose Create policy.

  4. On the JSON tab, input the following policy:

    • The following is an example of an IAM policy that grants Amazon Aurora PostgreSQL permission to invoke the AuroraGeocoder AWS Lambda function.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": [ "arn:aws:lambda:<Region>:<AccountId>:function:AuroraGeocoder" ] } ] }
  5. Choose Next: Tags to add optional tags.

  6. Choose Next: Review.

  7. Review your policy and enter the following details for the policy:

    • Name – Use alphanumeric and '+=,.@-_' characters. Maximum 128 characters. For example, AuroraGeocoderInvoke.

    • Description – Enter an optional description. Use alphanumeric and '+=,.@-_' characters. Maximum 1000 characters.

  8. Choose Create policy. Note the ARN for this policy, which you use to attach the policy to an IAM role.

Create an IAM role to give permission to Amazon Relational Database Service (Amazon RDS)

By creating an IAM role, Amazon Aurora PostgreSQL can assume the role on your behalf to accress your Lambda function. For more information, see Creating a role to delegate permissions to an IAM user in the IAM User Guide.

The following example is an AWS CLI command that creates a role named AuroraGeocoderInvokeRole:

aws iam create-role --role-name rds-lambda-role --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'

Attach your IAM policy to the IAM role

Once you've have an IAM role, attach the IAM policy you've created.

The following example is an AWS CLI command that attaches the policy AuroraGeocoderInvoke to the role AuroraGeocoderInvokeRole.

aws iam attach-role-policy --policy-arn AuroraGeocoderInvoke --role-name AuroraGeocoderInvokeRole

Add the IAM role to a Amazon Aurora DB cluster

The following example is an AWS CLI command to add an IAM role to a Amazon Aurora PostgreSQL DB cluster named MyAuroraCluster.

aws rds add-role-to-db-cluster \ --db-cluster-identifier MyAuroraCluster \ --feature-name Lambda \ --role-arn AuroraGeocoderInvokeRole \ --region your-region

Invoke the AWS Lambda function

After you grant permission to Amazon Aurora PostgreSQL to invoke your geocoding Lambda function, you can create an Amazon Aurora PostgreSQL user-defined function to invoke the geocoding AWS Lambda function. For more information, see Invoking an AWS Lambda function from an Amazon Aurora PostgreSQL DB cluster in the Amazon Aurora User Guide.

Install the required PostgreSQL extensions

To install the required PostgreSQL extensions aws_lambda and aws _commons extensions, see Overview of using a Lambda function in the Amazon Aurora User Guide.

CREATE EXTENSION IF NOT EXISTS aws_lambda CASCADE;

Install the required PostGIS extensions

PostGIS is an extension to PostgreSQL for storing and managing spatial information. For more information, see Working with the PostGIS extension on the Amazon Relational Database Service User Guide.

CREATE EXTENSION IF NOT EXISTS postgis;

Create an SQL user-defined function that invokes the Lambda function

In an SQL editor, create a new user-defined function f_SearchPlaceIndexForText to invoke the function AuroraGeocoder:

CREATE OR REPLACE FUNCTION f_SearchPlaceIndexForText( text text, bias_position geometry(Point, 4326) DEFAULT NULL, filter_bbox box2d DEFAULT NULL, filter_countries text[] DEFAULT NULL, max_results int DEFAULT 1 ) RETURNS TABLE ( label text, address_number text, street text, municipality text, postal_code text, sub_region text, region text, country text, geom geometry(Point, 4326) ) LANGUAGE plpgsql IMMUTABLE AS $function$ begin RETURN QUERY WITH results AS ( SELECT json_array_elements(payload) rsp FROM aws_lambda.invoke( aws_commons.create_lambda_function_arn('AuroraGeocoder'), json_build_object( 'text', text, 'biasPosition', CASE WHEN bias_position IS NOT NULL THEN array_to_json(ARRAY[ST_X(bias_position), ST_Y(bias_position)]) END, 'filterBBox', CASE WHEN filter_bbox IS NOT NULL THEN array_to_json(ARRAY[ST_XMin(filter_bbox), ST_YMin(filter_bbox), ST_XMax(filter_bbox), ST_YMax(filter_bbox)]) END, 'filterCountries', filter_countries, 'maxResults', max_results ) ) ) SELECT rsp->'Place'->>'Label' AS label, rsp->'Place'->>'AddressNumber' AS address_number, rsp->'Place'->>'Street' AS street, rsp->'Place'->>'Municipality' AS municipality, rsp->'Place'->>'PostalCode' AS postal_code, rsp->'Place'->>'SubRegion' AS sub_region, rsp->'Place'->>'Region' AS region, rsp->'Place'->>'Country' AS country, ST_GeomFromGeoJSON( json_build_object( 'type', 'Point', 'coordinates', rsp->'Place'->'Geometry'->'Point' ) ) geom FROM results; end; $function$;

Call the SQL function to geocode from Aurora

Running the SQL statement invokes the Lambda function AuroraGeocoder, which takes address records from the database table in the Amazon Aurora PostgreSQL database and geocodes them using a place index resource.

Note

Amazon Aurora PostgreSQL invokes the Lambda function for each call to the SQL user-defined function.

If you are geocoding 50 rows, Amazon Aurora PostgreSQL invokes the Lambda function 50 times. One for each row.

The following f_SearchPlaceIndexForText SQL function makes requests to Amazon Location's SearchPlaceIndexForText API through the AuroraGeocoder Lambda function. The function returns a geom column that's a PostGIS geometry, which ST_AsText(geom) converts to text.

SELECT *, ST_AsText(geom) FROM f_SearchPlaceIndexForText('Vancouver, BC');

By default, the return will contain one row. To request additional rows, up to the MaxResults limit, run the following SQL statement while providing a BiasPosition and limiting to results in Canada.

SELECT * FROM f_SearchPlaceIndexForText('Mount Pleasant', ST_MakePoint(-123.113, 49.260), null, '{"CAN"}', 5);

To filter results using a bounding box, then pass a Box2D as filter_bbox:

  • FilterBBox – Filters the results by returning places within a bounding box. An optional parameter.

SELECT * FROM f_SearchPlaceIndexForText('Mount Pleasant', null, 'BOX(-139.06 48.30, -114.03 60.00)'::box2d, '{"CAN"}', 5);

For more information on PostGIS types and functions, see the PostGIS Reference.

Enriching a database containing address data

You can construct a formatted address and simultaneously normalize and geocode using the Amazon Location operation SearchPlaceIndexForText given a database table with the following data broken out into the following columns:

  • id

  • address

  • city

  • state

  • zip

WITH source_data AS ( SELECT id, address || ', ' || city || ', ' || state || ', ' || zip AS formatted_address FROM addresses ), geocoded_data AS ( SELECT *, (f_SearchPlaceIndexForText(formatted_address)).* FROM source_data ) SELECT id, formatted_address, label normalized_address, ST_Y(geom) latitude, ST_X(geom) longitude FROM geocoded_data -- limit the number of rows that will be geocoded; remove this to geocode the entire table LIMIT 1;

The following example illustrates one resulting datatable row:

id | formatted_address | normalized_address | latitude | longitude ----+--------------------------------+--------------------------------------------+------------------+------------------- 42 | 123 Anytown Ave N, Seattle, WA | 123 Anytown Ave N, Seattle, WA, 12345, USA | 47.6223000127926 | -122.336745971039 (1 row)

Update the database table and populate columns

The following example updates the table and populates columns with results of SearchPlaceIndexForText queries:

WITH source_data AS ( -- select rows that have not been geocoded and created a formatted address for each SELECT id, address || ', ' || city || ', ' || state || ', ' || zip AS formatted_address FROM addresses WHERE label IS NULL -- limit the number of rows that will be geocoded; remove this to geocode the entire table LIMIT 1 ), geocoded_data AS ( -- geocode each row and keep it linked to the source's ID SELECT id, (f_SearchPlaceIndexForText(formatted_address)).* FROM source_data ) UPDATE addresses -- populate columns SET normalized_address = geocoded_data.label, latitude = ST_Y(geocoded_data.geom), longitude = ST_X(geocoded_data.geom) FROM geocoded_data -- ensure that rows match WHERE addresses.id = geocoded_data.id;

Next steps

Sample code is available in the Amazon Location Service samples repository on GitHub, which includes an AWS CloudFormation template.