Developer guide - Discovering Hot Topics Using Machine Learning

Developer guide

This section provides the source code for the solution and additional topics.

Source code

Visit our GitHub repository to download the templates and scripts for this solution, and to share your customizations with others. If you require an earlier version of the CloudFormation template, you can request from the GitHub issues page. The Discovering Hot Topics Using Machine Learning templates are generated using the AWS Cloud Development Kit (AWS CDK). Refer to the README.md file for more information.

Retrieve the Amazon QuickSight Principal ARN

To retrieve the Amazon QuickSight User Principal ARN, you must have access to a shell or terminal with the AWS CLI installed. For installation instructions, refer to What Is the AWS Command Line Interface in the AWS CLI User Guide. Optionally, you can use the AWS CloudShell service to run AWS CLI commands.

Running the following command returns the list of users with their corresponding QuickSight User ARNs.

aws quicksight list-users --region <aws-region> --aws-account-id <account-id> --namespace <namespace-name>
Note

The <namespace-name> is default, unless explicitly created in Amazon QuickSight.

Choose an Admin user, or a user who has permissions to create QuickSight resources in that account and AWS Region.

Retrieve and manage API Key for YouTube API authentication

You must create a Google Cloud Platform (GCP) account to access YouTube APIs. After creating a GCP account, you can use the following procedure to retrieve and manage YouTube API.

Note

We strongly recommend that you secure your GCP account with Multi-Factor Authentication (MFA) and any other security best practices recommended by GCP.

  1. Log in to the GCP console create a project. We recommend creating a unique project for this solution rather than using an existing project, which will allow you to have better control on API Keys and API access.

  2. Select your project and select API and Services from the left navigation menu.

  3. Choose Enable APIs and Services. Search for YouTube Data API v3 and select this API option. On the next page, select Enable to turn on this API.

  4. From the API and Services left navigation menu, select Credentials and create a new API Key. Restrict the API Key for use with YouTube Data API v3.

  5. Copy the new key and store it in AWS Systems Manager Parameter Store under the key path you configured during deployment.

The Credentials section provides additional options to regenerate, delete, or revoke access for API keys.

Retrieve and manage API credentials for Reddit API authentication

The Reddit ingestion uses clientId, , clientSecret, refreshToken, and userAgent. The userAgent is generated dynamically using the deployed stack name. The clientId, clientSecret, and refreshToken should be stored in the AWS Systems Manager Parameter Store as a JSON string. A sample JSON string is as below.

{"clientId": "clientIdFromReddit", "clientSecret": "clientSecretFromReddit", "refreshToken": "generatedRefreshToken"}

Use the Reddit app to retrieve your clientId and clientSecret. If you don’t have them, sign up for a Reddit app.

reddit-client-id-secret

Use the Reddit app to retrieve a client ID and client secret,

Reddit app to retrieve a client ID and client secret

There are various options to retrieve the refreshToken. The recommended options are:

—Or—

Note

The refreshToken requires only the read scope.

Important

Security considerations

The primary reason for choosing refreshToken with clientId and clientSecret to authenticate with Reddit API is because this option does not require Reddit sign-in credentials.

The generated refreshToken can be revoked or the Reddit App deleted so that the solution does not have any access to the Reddit APIs.

We recommended that you at least rotate the refreshToken on a regular basis (between every 30-90 days or based on specific Organzation’s security policy). You must also update the Parameter Store JSON string with every rotation.

Database schema information

The following diagram displays a high-level schema structure for the various tables created in AWS Glue with their entity relationships. The data model is not normalized and includes redundant attributes for reporting performance.

Depicts database schema structure

Database schema structure