Pre-built Amazon QuickSight dashboard details - Discovering Hot Topics using Machine Learning

Pre-built Amazon QuickSight dashboard details

The solution uses machine learning algorithms to identify the most dominant topics referenced in ingested text and image data. A list of topics is generated: ‘000’, ‘001’, ‘002’ and so on, where ‘000’ is the most dominant topic within the dataset. Each topic consists of a collection of the relevant phrases within that topic.


         Controls on the Amazon QuickSight dashboard

Figure 2: Controls on the Amazon QuickSight dashboard

The following controls allow you to filter the data on the various charts:

  1. Select platform: Allows selection of multiple platforms and filters the data from the deployed platform. The control displays both newsfeed and Twitter irrespective of the ingestion option selected when deploying the solution.

  2. Select topic: Allows selection of multiple topics as detected by the solution.

  3. Search text: Provides a text box to filter the ingested data that contains this key word or phrase. This control is configured to search for data on the translated text field.

  4. Search name: Provides a mechanism to search by user names (for tweets) or topics for (news feeds). This search is an exact search and is case sensitive.

  5. Search source: This field provides the user ID on twitter (for tweets) or websites path (for news feeds). This search is an exact search and is case sensitive.


        
          Example Amazon QuickSight dashboard for aggregating topic analyses

Figure 3: Example Amazon QuickSight dashboard for aggregating topic analyses

The example QuickSight dashboard in Figure 3 is a topic analysis dashboard for aggregating and contextualizing data from an ingestion source. The ingestion source can be selected from the Source dropdown in the Controls section. All the charts on this analysis worksheet render based on the selected Source. The first row in Figure 3 has four visuals:

  1. A table displaying the 10 most dominant topics in the dataset. Selecting a specific topic filters the word cloud of phrases (visual #5), the donut chart representing the sentiment for the selected topic and phrase (visual #6), the heat map of topics (visual #7), and a tabular view of ingested data (visual #8) to render information for the selected topic.

  2. A donut chart representing overall sentiment analysis of the dominant topics (positive, negative, neutral, or mixed sentiment). Selecting a specific sentiment filters the table with a list of the most dominant topics (visual #1), the word cloud of phrases (visual #5), the donut chart representing the sentiment for the selected topic and phrase (visual #6), the heat map of topics (visual #7), and the table (visual #8) to render information for the selected sentiment.

  3. An area line chart plotting a brand’s sentiment trend mapped over the last seven days. Selecting a specific plot point for sentiment and date filters a table with the list of the most dominant topics (visual #1), the word cloud of phrases (visual #5), the donut chart representing the sentiment for the selected topic and phrase (visual #6), the heat map of topics (visual #7), and the table (visual #8) to render information for the selected sentiment.

  4. An area line chart plotting a brand’s sentiment trend mapped over the last 30 days.

You can use the dashboard to filter information and gain insights on sentiment context and customer perception. The visuals in the second row in Figure 1 to explore the most dominant topic ‘000’. The second row contains the following visuals:

  1. A word cloud aggregating all of the detected phrases in the dominant topics. Selecting a specific phrase filters the list of dominant topics (visual #1), the donut chart representing the sentiment for the selected topic and phrase (visual #6), the heat map of topics (visual #7), and the table (visual #8).

  2. A donut chart representing the sentiment analysis of the selected topic and phrase. Selecting a specific sentiment (in Figure 1: Negative), filters the list of dominant topics with the selected sentiment (visual #1), the word cloud of phrases for the selected sentiment (visual #5), the heat map of topics (visual #7), and table (visual #8).

  3. A heat map with details of daily tweet counts for each topic. Selecting the heat map cell filters the table with a list of the most dominant topics (visual #1), the word cloud representing all the phrases within the dominant topics (visual #5), the donut chart representing the sentiment analysis of the selected heat map cell (visual #6), the heat map of topics (visual #7), and table (visual #8).

  4. A tabulated view of the records ingested.


        
          Example Amazon QuickSight dashboard for text analyses

Figure 4: Example Amazon QuickSight dashboard for text analyses

The ingested records are subjected to entity detection (detection of commercial items, date, event, location, organization, person, title, and quantity) and key phrase detection (detection of descriptive noun phrases). Figure 4 is an included dashboard for text analysis. It displays the following elements:

  1. A horizontal stacked bar chart displaying the top 50 entities grouped by sentiment (visual #1). You can select a specific entity and the sentiment group filters the tabular view of ingested data (visual #4).

  2. A horizontal stacked bar chart displaying the top 50 key phrases grouped by their sentiment (visual #2). Selecting a specific phrase from the sentiment group filters the table (visual #4).

  3. A geographical map showing the origin of the tweets (applies only to twitter —not RSS feeds—and when the tweets contain geo-coordinate information). The size of the circle on the map is a count of tweets originating from that location. After selecting the location on the map, the visuals on the page are filtered to display tweets belonging to the specific geo coordinates. Tweets that do not have public geo coordinates are also filtered out.

  4. A tabular view of ingested data (visual #4) containing the date (Date), the ID (example: tweet ID), the text, and the tweet text translated to English. Selecting a record (row) in the table opens a new browser window that navigates to its source (this could be twitter.com or the news site).

This solution also uses Amazon Rekognition to analyze images, detect entities in images, and extract embedded text from images. The service provides an unsafe image detection feature that creates moderation labels for images containing negative or unsafe content, for example, explicit adult content or content with violent elements.

The data generated from ingested images is used to generate topic modeling, key phrase, and sentiment analysis inferences. These inferences can be visualized using donut or pie charts, word clouds, or stack charts.


        
          Example Amazon QuickSight dashboard for aggregating image analyses

Figure 5: Example Amazon QuickSight dashboard for aggregating image analyses

In the example dashboard in Figure 5, visuals in the second and third row render information for the selected topic: ‘004’. This allows you to filter and focus your analysis on the specific context extracted from images. It provides the following visualizations:

  1. A word cloud aggregating the entities existing in the text embedded in the images. Selecting a word or phrase within the word cloud filters the visuals in the first row: the stacked horizontal bar chart of key phrases (visual #2) and the tabular view of ingested data (visual #3) to reflect the selection.

  2. A stacked horizontal bar chart displaying key phrases in the text embedded in the images. Selecting a phrase with the sentiment refreshes the visuals in the first row: the word cloud (visual #1) and the table (visual #3).

  3. A tabular view of the text embedded within the images and the URLs of the images. Selecting a record (row) in the table opens a new browser window that navigates to its sources (twitter.com or the news website).

  4. The moderation labels associated with the images in the tweets related to a brand. Selecting a specific label filters the table (visual #5) that contains images flagged with that label.

  5. A tabular view of records containing the moderation labels detected in the images. Selecting a record (row) in the table opens a new browser window that navigates to its source (twitter.com or the news website.

You can use the geo coordinate information from tweets analyze tweets originating from a location (when geo coordinates are publicly available). This analysis worksheet can only be used with social media records that contain geo-coordinates.


        
          Example Amazon QuickSight dashboard to analyze inferences by geo coordinates

Figure 6: Example Amazon QuickSight dashboard to analyze inferences by geo coordinates

The Analysis by Geography worksheet is only applicable to Twitter (because only tweets have geo-coordinates). Selecting circles on the map filters the visuals on the geo coordinate tab.

  1. A geographical map displaying the origin of the tweets (only when the tweets provide public geo-coordinate information). The size of the circle on the map is a count of tweets originating from that the location. After selecting the location on the map, the visuals on the page are filtered to display tweets belonging to the specific geo coordinates Tweets that do not provide public geo coordinates are filtered out.

  2. A table displaying the 10 most dominant topics in the data set. Selecting the topic filters the rest of the visuals on the tab to display content associated with that topic.

  3. A word cloud of terms found in the most dominant topics.

  4. A word cloud of dominant phrases. Selecting the phrase filters the visuals on the tab to display content that is relevant to that phrase.

  5. A word cloud of dominant entities. Selecting the entity filters the visuals on the tab to display content that is relevant to that entity.

  6. Sentiment trend over the last 1 week.

  7. Tabular view of tweets. Selecting a record (row) in the table opens a new browser window that navigates to twitter.com and displays the selected tweet. The table displays the retweet count, quote count, reply count, and favorite count at the time when the tweet was ingested (as returned by the Twitter Search API call).