Analyzing exported data with Amazon Athena - AWS IoT SiteWise

Analyzing exported data with Amazon Athena

After you have your asset property data in Amazon S3, you can use several AWS services to generate reports or analyze and query your data:

You can find other AWS services that can interact with your data in Amazon S3 listed under Analytics in the AWS Management Console.

Note

The stack creates an AWS Glue database to format asset property data. You can't query this database for asset data. Follow the steps in this section to create an AWS Glue database that you can query.

In this tutorial, you learn how to configure the prerequisites to use Amazon Athena and how to use Athena to run SQL queries on your exported AWS IoT SiteWise asset data. To query data with Athena, you must first populate the AWS Glue Data Catalog with your asset data. The Data Catalog contains databases and tables, and Athena can access data in the Data Catalog. You can create an AWS Glue crawler that regularly updates the Data Catalog with your exported asset data.

Configuring a crawler to populate the AWS Glue Data Catalog

AWS Glue crawlers crawl data stores to populate tables in the AWS Glue Data Catalog. In this procedure, you create and run an AWS Glue crawler for your S3 bucket that contains exported asset data. The crawler creates a table for asset property updates and a table for asset metadata. Then, you can perform SQL queries on these tables with Athena. For more information, see Populating the AWS Glue Data Catalog and Defining crawlers in the AWS Glue Developer Guide.

To create an AWS Glue crawler

  1. Navigate to the AWS Glue console.

  2. In the navigation pane, choose Crawlers.

  3. Choose Add crawler.

  4. On the Add crawler page, do the following:

    1. Enter a name for your crawler, such as IoTSiteWiseDataCrawler, and then choose Next.

    2. For Crawler source type, choose Data stores, and then choose Next.

    3. On the Add a data store page, do the following:

      1. For Choose a data store, choose S3.

      2. In Include path, enter s3://DOC-EXAMPLE-BUCKET1 to add your asset data bucket as a data store. Replace DOC-EXAMPLE-BUCKET1 with the bucket name that you chose when you created the stack.

      3. Choose Next.

        
                      AWS Glue crawler "Add a data store"
                        screenshot.
    4. On the Add another data store page, choose No, and then choose Next.

    5. On the Choose an IAM role page, do the following:

      1. To create a new service role that allows AWS Glue to access the S3 bucket, choose Create an IAM role.

      2. Enter a suffix for your role's name, such as IoTSiteWiseDataCrawler.

      3. Choose Next.

    6. For Frequency, choose Hourly, and then choose Next. The crawler updates the tables with new data each time it runs, so you can choose any frequency that fits your use case.

    7. On the Configure the crawler's output page, do the following:

      1. Choose Add database to create an AWS Glue database for your asset data.

      2. Enter a name for the database, such as iot_sitewise_asset_database.

      3. Choose Create.

      4. Choose Next.

    8. Review the crawler details, and then choose Finish.

      
                  AWS Glue crawler "Review crawler details" screenshot.

By default, your new crawler doesn't immediately run. You must manually run it or wait until it runs on its configured schedule.

To run a crawler

  1. On the Crawlers page, select the check box for your new crawler, and then choose Run crawler.

    
              AWS Glue "Crawlers" screenshot with "Run crawler" highlighted.
  2. Wait until the crawler finishes and has a status of Ready.

    The crawler can take several minutes to run, and its status updates automatically.

  3. In the navigation pane, choose Tables.

    You should see two new tables: asset_metadata and asset_property_updates.

Querying data with Athena

Athena automatically discovers your asset data tables in the AWS Glue Data Catalog. To perform queries on the intersection of these tables, you can create a view, which is a logical data table. For more information, see Working with views in the Amazon Athena User Guide.

After you create a view that combines asset property data and metadata, you can run queries that output property values with asset and property names attached. For more information, see Running SQL queries using Amazon Athena in the Amazon Athena User Guide.

To query asset data with Athena

  1. Navigate to the Athena console.

    If the Getting started page appears, choose Get Started.

  2. If you're using Athena for the first time, complete the following steps to configure an S3 bucket for query results. Athena stores the results of your queries in this bucket.

    Important

    Use a different bucket than your asset data bucket, so the crawler that you created earlier doesn't crawl query results. We recommend that you create a bucket to use only for Athena query results. For more information, see How do I create an S3 bucket? in the Amazon Simple Storage Service User Guide.

    1. Choose Settings.

    2. In Query result location, enter the S3 bucket for Athena query results. The bucket must end with /.

      
                  Athena "Settings" screenshot with "Query result location"
                    highlighted.
    3. Choose Save.

  3. The left panel contains the data source to query. Do the following:

    1. For Data source, choose AwsDataCatalog to use the AWS Glue Data Catalog.

    2. For Database, choose the AWS Glue database that you created with the crawler.

      
                  Athena "Query Editor" screenshot with "Database" highlighted.

    You should see two tables: asset_metadata and asset_property_updates.

  4. To create a view from the combination of asset property data and metadata, enter the following query, and then choose Run query.

    CREATE OR REPLACE VIEW iot_sitewise_asset_data AS SELECT "from_unixtime"("time_in_seconds" + ("offset_in_nanos" / 1000000000)) "timestamp", "metadata"."asset_name", "metadata"."asset_property_name", "data"."asset_property_value", "metadata"."asset_property_unit", "metadata"."asset_property_alias" FROM ( "iot_sitewise_asset_database".asset_property_updates data INNER JOIN "iot_sitewise_asset_database".asset_metadata metadata ON ( ("data"."asset_id" = "metadata"."asset_id") AND ("data"."asset_property_id" = "metadata"."asset_property_id") ) );

    This query joins the asset property data and metadata tables on asset ID and property ID to create a view. You can run this query multiple times because it replaces the existing view if the view already exists.

  5. To add a new query, choose the + icon.

  6. To view a sample of asset data, enter the following query, and then choose Run query. Replace the timestamps with an interval for which your bucket has data.

    SELECT * FROM "iot_sitewise_asset_database"."iot_sitewise_asset_data" WHERE "timestamp" BETWEEN TIMESTAMP '2020-05-14 12:00:00.000' AND TIMESTAMP '2020-05-14 13:00:00.000' ORDER BY "timestamp" DESC LIMIT 50;

    This query outputs up to 50 data points between two timestamps, with the most recent entries shown first.

    Your query output might look similar to the following results.

    
              Athena "Query Editor" screenshot with "Run query" highlighted.

You can now run queries useful to your AWS IoT SiteWise application. For more information, see SQL reference for Amazon Athena in the Amazon Athena User Guide.