Cost - Discovering Hot Topics Using Machine Learning

Cost

You are responsible for the cost of the AWS services used while running this solution, which can vary based on the following factors:

  • The volume of data ingested (based on the configuration for the ingestion source — Reddit, RSS feeds, and/or YouTube comments).

    When ingesting Reddit comments, the number subreddits of SubRedditsToFollow and ingestion frequency contribute to the volume of data that the solution ingests.

  • SubRedditsToFollow – Adding more subreddits increases the volume of data.

    When ingesting RSS feeds, the number of news sites and the search query string both contribute to the volume of data that the solution ingests.

  • Search query string – Broader and more generic terms result in a larger volume of ingested data. Specific and precise terms result a smaller volume of ingested data.

  • News feed configuration – A broad configuration (topics, languages, and countries) increases the volume of ingested data.

    When ingesting YouTube comments, the ingestion works in two stages: (1) it retrieves the videos based on search criteria, and (2) it retrieves the comments for each of the videos. Ingestion search can be based on a search query, a channel ID, or both.

  • Search query – A generic search query results in a larger list of videos, and a large volume of comments for NLP processing.

  • Channel ID – The volume of videos in the YouTube channel and the number of comments associated with each of the videos.

  • Ingestion window – The search filter defaults to filtering out videos published beyond the seven-day window. Increasing the window size can increase the number of videos searched and the volume of comments ingested. This filter can be customized from the Lambda environment variable.

  • The number of queries for visualization.

  • The number of records that require language detection (RSS feeds, the solution uses Amazon Comprehend to detect the data's source language before processing.

  • The number of records that must be translated into English.

  • The number of images (media assets) ingested with the RSS feeds.

As of this revision, the cost for running this solution in the US East (N. Virginia) Region using Reddit, RSS news feed, and YouTube comments ingestion, with the default values, and reports queried sporadically, is approximately $375 per week. We recommend creating a budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this solutuion.

Example cost tables

The following tables provide an example cost breakdown for deploying this solution with the default parameters in the US East (N. Virginia) Region for one week with different volume scenarios (excludes free tier).

Example 1: Ingesting – RSS news feeds (~100 items/day) + comments from 2 subreddits (~1000 comments/day) + YouTube comments (~1600 comments/day) + custom ingestion (~600 items/day)

AWS service Dimensions Cost [USD/month]
Amazon Athena 70 queries/week and 50 GB data scanned/query $75.00
Amazon CloudWatch Event Rule – 1 5,040 events/week (runs every 2 mins) $0.05
Amazon CloudWatch Event Rule – 2 14 events/week (runs every day) $0.0004
Amazon Comprehend ~100 news items/day + ~1000 Reddit comments/day + ~1600 YouTube comments/day + ~600 custom ingestion items/day $210.00
Amazon DynamoDB Records inserted with TTL 7-day expiry to keep state for Reddit ingestion, YouTube ingestion and news feeds, on-demand capacity $0.50
Amazon Data Firehose 21 Firehose, total 3 GB/week $0.50
Amazon Kinesis Data Streams 1 datastream $10.00
Amazon Simple Queue Service (Amazon SQS) 15 queues (regular queues + DLQ) $2.00
Amazon Rekognition – Label and text detection 20 images/day $1.20
Amazon Simple Storage Service (Amazon S3) 5 buckets, 75 GB $20.00
Amazon Translate

Assuming 280 characters/item with 4,000 items, assumes 50% of 4,000 items in non-English language = 2,000 items = 560,000 characters/week

$40.00
AWS Lambda

30 Lambda functions

$4.00
AWS Step Functions 2 workflow definitions ~ 30 states (in all) $30.00
AWS Key Management Service Using AWS managed keys with DynamoDB, Amazon S3 (SSE-S3), Kinesis Data Streams, SQS $50.00
Amazon QuickSight 10 readers reading twice/day $60.00
Total: ~$503.00/month

Example 2: Ingesting – ~1000 Reddit comments/day

AWS service Dimensions Cost [USD/month]
Amazon Athena 70 queries/week and 100 GB data scanned/query $75.00
Amazon CloudWatch Event Rule – 1 5040 events/week (runs every 2 mins) $0.05
Amazon CloudWatch Event Rule – 2 7 events/week (runs every day) $0.0004
Amazon Comprehend ~1000 Reddit comments/day $56.00
Amazon DynamoDB Records inserted with TTL 7-day expiry, on-demand capacity $0.20
Amazon Data Firehose 21 Firehose, total 3 GB/week $0.20
Amazon Kinesis Data Streams 1 data stream $11.00
Amazon Simple Queue Service (Amazon SQS) 15 queues (regular queues + DLQ) processing $1.00
Amazon Simple Storage Service (Amazon S3) 5 buckets, 75 GB $20.00
Amazon Translate Assuming 280 characters/item with 4,000 items, assumes 50% of 4,000 items in non-English language = 2,000 items = 560,000 characters/week $40.00
AWS Lambda

30 Lambda functions

$4.00
AWS Step Functions 2 workflow definitions ~ 30 states (in all) $9.00
AWS Key Management Service Using AWS managed keys with DynamoDB, Amazon S3 (SSE-S3), Kinesis Data Streams, SQS $30.00
Amazon QuickSight 10 readers reading twice/day $60.00
Total: ~$306.00/month

Example 3 – Ingesting ~1600 YouTube comments/day

AWS service Dimensions Cost [USD/month]
Amazon Athena 70 queries/week and 50 GB data scanned/query $75.00
Amazon CloudWatch Events – 1 5,040 events/week (runs every 2 mins) $0.05
Amazon CloudWatch Events – 2 14 events/week (runs every day) $0.0004
Amazon Comprehend ~1600 YouTube comments/day $56.00
Amazon DynamoDB Records inserted with TTL 7-day expiry to keep state for Reddit ingestion on-demand capacity $0.20
Amazon Data Firehose 21 Firehose, total 3 GB/week $0.20
Amazon Kinesis Data Streams 1 data stream $10.00
Amazon Simple Queue Service (Amazon SQS) 15 queues (regular queues + DLQ) $1.00
Amazon Simple Storage Service (Amazon S3) 5 buckets, 75 GB $20.00
Amazon Translate Assuming 280 characters/item with 4,000 items, assumes 50% of 4,000 items in non-English language = 2,000 items = 560,000 characters/week $40.00
AWS Lambda (128 MB) 30 Lambda functions $4.00
AWS Step Functions 2 workflow definitions ~ 30 states (in all) $15.00
AWS Key Management Service Using AWS managed keys with DynamoDB, Amazon S3 (SSE-S3), Kinesis Data Streams, SQS $30.00
Amazon QuickSight 10 readers reading twice/day $60.00
Total: ~$311/month

Example 4: Ingesting ~100 news feeds per day

AWS service Dimensions Cost [USD/month]
Amazon Athena 70 queries/week and 20 GB data scanned/query $30.00
Amazon CloudWatch Event Rule – 1 5,040 events/week (runs every 2 mins) $0.04
Amazon CloudWatch Event Rule – 2 14 events/week (runs every day) $0.000336
Amazon Comprehend ~100 news feeds/day $30.00
Amazon DynamoDB Records inserted with TTL 7-day expiry to keep state for new feeds on-demand capacity $1.20
Amazon Data Firehose 21 Firehose, total 3 GB/week $1.00
Amazon Kinesis Data Streams 1 data stream $11.00
Amazon Simple Queue Service (Amazon SQS) 14 queues (regular queues + DLQ) $0.60
Amazon Rekognition – Label and text detection 20 images/day $12.00
Amazon Simple Storage Service (Amazon S3) 5 buckets, 20 GB $12.00
Amazon Translate

Assuming 280 characters/item with 4,000 items, assumes 50% of 4,000 items in non-English language = 2,000 items = 560,000 characters/week

$40.00
AWS Lambda 23 Lambda functions $4.00
AWS Step Functions 2 workflow definitions ~ 30 states (in all) $2.00
AWS Key Management Service Using AWS managed keys with DynamoDB, Amazon S3 (SSE-S3), Kinesis Data Streams, SQS $20.00
Amazon QuickSight 10 readers reading twice/day $60.00
Total: ~$222.00/month