Create an AWS Glue Crawler Create a Data Catalog by Using the AWS Glue Crawler Create and Run Athena Queries Working with Athena Queries

Create Custom Reports and Analyze AppStream 2.0 Usage Data

Amazon Athena is a serverless, interactive query service that you can use to analyze data stored in your S3 buckets using standard SQL queries. You can use Athena to aggregate your usage reports or generate other types of custom reports.

Create an AWS Glue Crawler
Create a Data Catalog by Using the AWS Glue Crawler
Create and Run Athena Queries
Working with Athena Queries

Create an AWS Glue Crawler

AWS Glue is a fully managed extract, transform, and load (ETL) service that lets you create a database from your Amazon S3 data and query that database by using Athena. This database is also referred to as an AWS Glue Data Catalog. An AWS Glue crawler can automatically detect the schema of your Amazon S3 data and create the corresponding database and tables. AppStream 2.0 provides an AWS CloudFormation template that you can use to create the necessary AWS Glue resources.

Important

Completing the steps in the following procedure creates an AWS Glue crawler. However, these steps don’t start the crawler. To start the crawler, you must perform the steps in the next procedure. For more information about AWS Glue crawlers, see Defining Crawlers.

To create an AWS Glue crawler

Open the AppStream 2.0 console at https://console.aws.amazon.com/appstream2.
Choose the AWS Region for which you have subscribed to usage reports.
In the navigation pane, choose Usage Reports, and verify that usage reports logging is enabled.
On the Report Details tab, in the paragraph next to Analytics, choose the CloudFormation template link.

Choosing the link opens the AWS CloudFormation console, where you can review the parameters of the AWS CloudFormation stack specified by the template before you run it. The template, when run, creates an AWS Glue crawler and several sample Athena queries.
On the Specify Details page, next to ScheduleExpression, either keep the default value or specify a different cron expression value for the frequency that you want to run the crawler. Do not change any other default value. When you're done, choose Next.

By default, the crawler is scheduled to run on a daily basis, but you can configure the crawler to run weekly, monthly, or on another frequency. For information about cron syntax, see Cron Expressions.
On the Options page, keep the default values, and choose Next.

On the Review page, select the check box next to "I acknowledge that AWS CloudFormation might create IAM resources with custom names," and then choose Create.

You must have sufficient AWS Glue and AWS Identity and Access Management (IAM) permissions to create and run the AWS CloudFormation stack. If you don't have the required permissions, ask your Amazon Web Services account administrator either to perform these steps in your account or to grant you the following permissions.


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "athena:CreateNamedQuery",
                "athena:BatchGetNamedQuery",
                "athena:GetNamedQuery",
                "athena:StartQueryExecution",
                "athena:GetQueryResults",
                "athena:GetQueryExecution",
                "athena:ListNamedQueries",
                "cloudformation:DescribeStacks",
                "cloudformation:GetStackPolicy",
                "cloudformation:DescribeStackEvents",
                "cloudformation:CreateStack",
                "cloudformation:GetTemplate",
                "cloudformation:ListChangeSets",
                "cloudformation:ListStackResources",
                "iam:GetRole",
                "iam:CreateRole",
                "iam:GetRolePolicy",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucket",
                "s3:ListMultipartUploadParts",
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:iam::*:role/AppStreamUsageReports-AppStreamUsageReportGlueRole*",
                "arn:aws:cloudformation:*:*:stack/AppStreamUsageReports/*",
                "arn:aws:athena:*:*:workgroup/primary",
                "arn:aws:s3:::aws-athena-query-results-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:AttachRolePolicy",
                "iam:PutRolePolicy",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::appstream-logs-*",
                "arn:aws:iam::*:role/AppStreamUsageReports-AppStreamUsageReportGlueRole*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::*:role/AppStreamUsageReports-AppStreamUsageReportGlueRole*"
            ],
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "glue.amazonaws.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudformation:GetTemplateSummary",
                "glue:GetResourcePolicy",
                "glue:GetCrawlers",
                "glue:BatchGetCrawlers",
                "glue:GetClassifiers",
                "glue:CreateClassifier",
                "glue:ListCrawlers",
                "glue:GetTags",
                "glue:GetCrawlerMetrics",
                "glue:GetClassifier",
                "tag:GetResources"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "athena:RunQuery",
            "Resource": "arn:aws:athena:*:*:workgroup/primary"
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetTables",
                "glue:GetPartitions",
                "glue:GetTable"
            ],
            "Resource": [
                "arn:aws:glue:*:*:table/appstream-usage/*",
                "arn:aws:glue:*:*:database/appstream-usage",
                "arn:aws:glue:*:*:catalog"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:CreateDatabase",
                "glue:GetDatabases"
            ],
            "Resource": [
                "arn:aws:glue:*:*:database/appstream-usage",
                "arn:aws:glue:*:*:catalog"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetCrawler",
                "glue:StartCrawler",
                "glue:CreateCrawler"
            ],
            "Resource": "arn:aws:glue:*:*:crawler/appstream-usage*"
        },
        {
            "Effect": "Allow",
            "Action": "glue:GetCatalogImportStatus",
            "Resource": "arn:aws:glue:*:*:catalog"
        }
    ]
}

Create a Data Catalog by Using the AWS Glue Crawler

The AWS Glue crawler, when run, creates a Data Catalog and schema that are mapped to the structure of your sessions and applications reports. Each time a new report is stored in your Amazon S3 bucket, you must run the crawler to update your AWS Glue Data Catalog with the data from the new report.

Note

Charges may apply to the running of your AWS Glue crawler. For more information, see AWS Glue Pricing.

Open the AWS Glue console at https://console.aws.amazon.com/glue/.
Choose the AWS Region for which you have subscribed to usage reports.
Select the check box next to the crawler named appstream-usage-sessions-crawler, and then choose Run crawler. Repeat this step for the crawler named appstream-usage-apps-crawler.

Performing these steps runs the crawlers and schedules them to run automatically according to the schedule specified in the AWS CloudFormation stack.
After both crawlers finish running, in the navigation pane, choose Databases. A database named appstream-usage, which represents your usage reports, displays. This database is an AWS Glue Data Catalog that was created when appstream-usage-sessions-crawler and appstream-usage-apps-crawler were run.
To view the tables in the database, choose appstream-usage, Tables. Two tables, applications and sessions, which represent your applications and sessions usage reports respectively, display. Choose either table to view its schema.

You can now query these tables in Athena by using SQL.

Create and Run Athena Queries

To query your usage reports by using Athena, perform the following steps.

Note

Charges may apply to Athena queries that you run. For more information, see Amazon Athena Pricing.

Open the Athena console at https://console.aws.amazon.com/athena/.
Under Database, choose appstream-usage.
In the query pane, enter a SQL query, and choose Run query.

Working with Athena Queries

This section provides SQL queries that you can run in Athena to analyze the usage reports data in your Amazon S3 bucket.

To create a consolidated report of all sessions in a given month, run the following query:


SELECT *
FROM "appstream-usage"."sessions"
WHERE year='four-digit-year'
AND month='two-digit-month'

You can also perform join operations between the applications and sessions tables in your query. For example, to view the distinct users who launched each application in a given month, run the following query:


SELECT DISTINCT apps.application_name, sessions.user_id
FROM "appstream-usage"."applications" apps
   INNER JOIN "appstream-usage"."sessions" sessions ON (apps.user_session_id = sessions.user_session_id AND sessions.year='four-digit-year' AND sessions.month='two-digit-month')
WHERE apps.year='four-digit-year'
  AND apps.month='two-digit-month'
ORDER BY 1, 2

Athena query results are stored as .csv files in an Amazon S3 bucket in your account that is named aws-athena-query-results-account-id-without-hyphens-region-code. For ease in locating query results, choose Save as and provide a name for your query before you run it. You can also choose the download icon in the Athena Results pane to download the results of the query as a .csv file.

To enhance performance and reduce costs, Athena uses partitioning to reduce the amount of data scanned in queries. For more information, see Partitioning Data. Usage reports are partitioned in your Amazon S3 buckets by year, month, and day. You can restrict your queries to certain date range partitions using the year, month, and day fields as conditions in your queries. For example, the following query ingests only the sessions reports for the week of May 19, 2019.


SELECT SUBSTRING(session_start_time, 1, 10) AS report_date, 
    COUNT(DISTINCT user_session_id) AS num_sessions
FROM "appstream-usage"."sessions"
WHERE year='2019'
   AND month='05'
   AND day BETWEEN '19' and '25'
GROUP BY 1
ORDER BY 1

In contrast, the following query produces identical results, but because it isn't restricted to any partitions, it ingests all sessions reports stored in your Amazon S3 bucket.


SELECT SUBSTRING(session_start_time, 1, 10) AS report_date, 
    COUNT(DISTINCT user_session_id) AS num_sessions
FROM "appstream-usage"."sessions"
WHERE session_end_time BETWEEN '2019-05-19' AND '2019-05-26'
GROUP BY 1
ORDER BY 1

If a session spans more than one day, the session and application records appear in the sessions and applications reports, respectively, corresponding to the day in which the session ended. For this reason, if you need to find records that relate to all sessions that were active during a given date range, consider expanding the partition set of your query by the maximum session length you have configured for your fleets.

For example, to view all sessions that were active for a given fleet during a calendar month, where the fleet had a maximum session duration of 100 hours, run the following query to expand your partition set by five days.


SELECT *
FROM "appstream-usage"."sessions"
WHERE fleet_name = 'fleet_name'
   AND session_start_time BETWEEN '2019-05-01' AND '2019-06-01'
   AND year='2019'
   AND (month='05' OR (month='06' AND day<='05'))
ORDER BY session_start_time

The AWS CloudFormation template that created the AWS Glue crawlers also created and saved several sample queries in your Athena account that you can use to analyze your usage data. These sample queries include the following:

Aggregated monthly session report
Average session length per stack
Number of sessions per day
Total streaming hours per user

Note
On-demand usage charges are rounded up to the next hour for each session.
Distinct users per app

To use any of these queries, perform the following steps.

Open the Athena console at https://console.aws.amazon.com/athena/.
Choose Saved Queries. The five queries noted before this procedure should display. The name of each query begins with "AS2." For example, "AS2_users_per_app_curr_mo."
To run a query, choose the query name rather than the option next to the name.
The text of the query appears in the query pane. Choose Run query.

To view these queries in a separate AWS CloudFormation template, see athena-sample-queries-appstream-usage-data_template.yml in the AWS Code Sample Catalog.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Usage Reports Fields

Logging AppStream 2.0 API Calls

Create Custom Reports and Analyze AppStream 2.0 Usage Data

Contents

Create an AWS Glue Crawler

Important

To create an AWS Glue crawler

Create a Data Catalog by Using the AWS Glue Crawler

Note

Create and Run Athena Queries

Note

Working with Athena Queries

Note