Launch the stack - Discovering Hot Topics using Machine Learning

Launch the stack

This automated AWS CloudFormation template deploys the Discovering Hot Topics using Machine Learning solution in the AWS Cloud.

Note

You are responsible for the cost of the AWS services used while running this solution. For more details, refer to the Cost section in this guide and the pricing webpage for each AWS service.

  1. Sign in to the AWS Management Console and select the button below to launch the discovering-hot-topics-using-machine-learning.template AWS CloudFormation template.

    
                  Discovering Hot Topics using Machine Learning launch button

    You can also download the template as a starting point for your own implementation.

  2. The template launches in the US East (N. Virginia) Region by default. To launch the solution in a different AWS Region, use the Region selector in the console navigation bar.

Note

This solution uses the Amazon Rekognition, Amazon Translate, and Amazon Comprehend services, which are not currently available in all AWS Regions. You must launch this solution in an AWS Region where these services are available. For the most current availability by Region, refer to the AWS Regional Services List.

  1. On the Create stack page, verify that the correct template URL is in the Amazon S3 URL text box and choose Next.

  2. On the Specify stack details page, assign a name to your solution stack. For information about naming character limitations, refer to IAM and STS Limits in the AWS Identity and Access Management User Guide.

  3. Under Parameters, review the parameters for this solution template and modify them as necessary. This solution uses the following default values.

Parameter Default Description
DeployTwitter Yes Option to select if you would like to ingest tweets. If you select No, the CloudFormation template will not deploy the Twitter ingestion components.
TwitterSearchQuery entertainment

The query to search on Twitter. Refer to the Twitter developer guide for details on how to build a query. Examples:

  • AWS OR CDK OR "Solution Constructs"

    There is an option to add geo coordinates to further filter the search based on the location of the tweet. For details, refer to Using the Twitter Search API with geo coordinates.

    Note

    This search parameter is updatable and can be modified by updating the CloudFormation template, or by updating the Lambda (ingestion-producer) function environment variable. The change to the twitter search query would be from this time forth (it is not a retrospective change).

TwitterSSMPathForBearerToken

/discovering-hot-topics-using-machine-learning/discovering-hot-topics-using-machine-learning/twitter

The Systems Manager Parameter Store key path to the credentials where the Twitter bearer token is stored as an encrypted string.
TwitterIngestQueryFrequency cron(0/10 * * * ? *) The ingestion schedule as a cron expression supported by a CloudWatch Event Rule that schedules the ingestion of social media feeds.
SupportedLanguages en,es Restricts tweets to the given language, given by an ISO 639-1 code (Example: de,en,es,it,pt,fr,ja,ko,hi,ar,zh-cn,zh-tw). Language detection is best-effort delivery as supported by Twitter’s search API. For more details, refer to the Twitter Search API documentation.
DeployNewsFeeds Yes Select Yes if you would like to ingest RSS news feeds. If you select No, the CloudFormation template will not deploy the news feed ingestion components
NewsFeedIngestFrequency cron(0 18 * * ? *) The ingestion schedule as a cron expression supported by a CloudWatch Event Rule that schedules the ingestion of news feeds.
NewsSearchQuery Optional input (Optional) Comma-separated list of keywords to filter news feeds. Only feeds containing at least one of the keywords from the list will be processed. If no keyword is provided, feeds will not be filtered and all news feeds are processed.

NewsFeedIngestConfig

{"country":"ALL", "language":"ALL", "topic":"ALL"}

Provide configuration for RSS feeds. This parameter should be configured as a JSON string. Example: {"country":"ALL", "language":"ALL", "topic":"ALL"}.

For Country and language use ISO code. The list of superset of all supported topics is: "tech", "news", "business", "science", "finance", "food", "politics", "economics", "travel", "entertainment", "music", "sport", "world".

Note
  • Not all topics are supported for each RSS provider.

  • Setting the value as ALL, is treated as a wild character search

TopicAnalysisFrequency cron(10 0 * * ? *) The schedule at which topic modeling jobs are run. Because the topic modeling jobs take approximately 35 minutes to run, the minute duration between jobs must be one hour. Additionally, because the data is stored in a folder structure that follows Apache Hive naming conventions, we recommend invoking the job a few minutes after the hour.
NumberOfTopics 10 The number of topics you want to detect as part of the topic modeling job between 1-100.
QuickSightPrincipalArn <blank> Provide the ARN of the Amazon QuickSight user that must have permissions to view and edit the datasets, analysis, and dashboard created by the AWS CloudFormation. For details on how to retrieve the QuickSight User ARN, refer to Retrieve Amazon QuickSight Principal ARN.
Note

If you leave this parameter blank, the solution deploys without the Amazon QuickSight resources that help visualize its various inferences.

  1. Choose Next.

  2. On the Configure stack options page, choose Next.

  3. On the Review page, review and confirm the settings. Check the box acknowledging that the template will create AWS Identity and Access Management (IAM) resources.

  4. Choose Create stack to deploy the stack.

    You can view the status of the stack in the AWS CloudFormation console in the Status column. You should receive a CREATE_COMPLETE status in approximately 10 minutes.

    Note

    In addition to the primary AWS Lambda functions <function(s)..>, this solution includes the solution-helper Lambda function, that runs only during initial configuration, or when updating or deleting resources.

    When you run this solution, you will notice both Lambda functions in the AWS Management Console. Only the <function> function is regularly active. However, do not delete the solution-helper function, as it is needed to manage associated resources.

    You can verify that the application is ingesting Twitter information by checking the DynamoDB table ststus. The table contains entries for every API call that the application makes to Twitter. The STATUS_COUNT column in the table also provides information about the number of tweets (records) that were retrieved for each API invocation.