Joining data - Amazon QuickSight

Joining data

You can use the join interface in Amazon QuickSight to join objects from one or more data sources. By using Amazon QuickSight to join the data, you can merge disparate data without duplicating the data from different sources.

Types of joined datasets

A join is performed between two QuickSight logical tables, where each logical table contains information about how to fetch data. When editing a dataset in QuickSight, the join diagram at the top half of the page shows each logical table as a rectangular block.

There are two different types of joined datasets in QuickSight: same-source and cross-source. A dataset is considered same-source when it doesn't have any joins, or when all of the following conditions are met:

  • If any of the logical tables refer to a QuickSight data source:

    • All of the logical tables in this dataset must refer to the same QuickSight data source. This doesn't apply if two separate QuickSight data sources refer to the same underlying database. It must be the exact same QuickSight data source. For more information about using a single data source, see Creating a dataset using an existing data source.

  • If any of the logical tables refer to a QuickSight dataset that is a parent dataset:

    • The parent dataset must use direct query.

    • The parent dataset must refer to the same QuickSight data source.

If the above conditions aren't met, the dataset is considered a cross-source join.

Facts about joining datasets

Both same-source and cross-source dataset joins have the following limitations.

What's the maximum number of tables a joined dataset can contain?

All joined datasets can contain up to 32 tables.

How large can joined data be?

Same-source datasets have no restrictions on data size.

With cross-source datasets QuickSight automatically detects the largest logical table in the dataset, and the total size of the other logical tables must sum up to less than 1 GB. The logical size of these tables is recomputed each time your dataset is refreshed into SPICE.

Note that the logical size as computed during a join is separate from the size logic used by SPICE. A SPICE dataset with a size of 1.5 GB could have a logical size of 1 GB.

Additionally, because cross-source datasets must use SPICE, they must adhere to the quotas and limits for SPICE. For more information, see SPICE quotas for imported data.

Can a joined dataset use direct query?

Same-source datasets support direct query, assuming there are no other restrictions on using direct query. For example, S3 data sources don't support direct query, so a same-source S3 dataset must still use SPICE.

Cross-source datasets must use SPICE.

Can calculated fields be used in a join?

All joined datasets can use calculated fields, but calculated fields can't be used in any on-clauses.

Can geographical data be used in a join?

Same-source datasets support geographical data types, but geographical fields can't be used in any on-clauses.

Cross-source datasets don't support geographical data in any form.

For some examples of joining tables across data sources, see the Joining across data sources on Amazon QuickSight post on the AWS Big Data Blog.

Creating a join

Use the following procedure to join tables to use in a dataset. Before you begin, import or connect to your data. You can create a join between any of the data sources supported by Amazon QuickSight, except Internet of Things (IoT) data. For example, you can add comma-separated value (.csv) files, tables, views, SQL queries, or JSON objects in an Amazon S3 bucket.

To add one or more joins

  1. Open the dataset that you want to work with.

  2. (Optional) Before you get started, decide if you want to disable the autogenerated preview based on of a sample of your data. To turn that off, choose Auto-preview at top right. It's turned on by default.

  3. If you haven't already chosen a query mode, choose Query mode.

    Choose SPICE to store your dataset in SPICE, or choose Direct query to pull live data every time. If your dataset contains one ore more manually uploaded file, your dataset is automatically stored in SPICE.

    If you choose SPICE, the data is ingested into QuickSight. Visuals that use the dataset run queries in SPICE, instead of on the database.

    If you choose Direct query, the data isn't ingested into SPICE. Visuals that use the dataset run queries on the database, instead of in SPICE.

    If you choose Query mode, make sure to set unique keys in the join, if applicable, to improve performance when loading visuals.

  4. On the data preparation page, choose Add data.

  5. In the Add data page that opens, choose one of the following options and complete the steps following:

    • Add data from a dataset:

      1. Choose Dataset.

      2. Select a dataset from the list.

      3. Choose Select.

    • Add data from a data source:

      1. Choose Data source.

      2. Select a data source from the list.

      3. Choose Select.

      4. Select a table from the list.

      5. Choose Select.

    • Create self-joins by adding a table multiple times. A counter appears after the name. An example is Product, Product (2), and Product (3). Field names in the Fields or Filters sections include the same counter so you can know which instance of the table a field came from.

    • Add a new file by choosing Upload a file, and then choose the file that you want to join.

  6. (Optional) Choose Use custom SQL to open the query editor and write a query for a SQL data source.

  7. (Optional) After you add data, interact with each table by choosing its menu icon. Rearrange the tables by dragging and dropping them.

    An icon with red dots appears to indicate that you need to configure this join. Two red dots appear for joins that aren't yet configured. To create joins, choose the first join configuration icon.

  8. (Optional) To change an existing join, reopen Join configuration by choosing the join icon between two tables.

    The Join Configuration pane opens. On the join interface, specify the join type and the fields to use to join the tables.

  9. At the bottom of the screen, you can see options to set a field in one table equal to a field in another table.

    1. In the Join clauses section, choose the join column for each table.

      (Optional) If the tables that you selected join on multiple columns, choose Add a new join clause. Doing this adds another row to the join clauses, so you can specify the next set of columns to join. Repeat this process until you have identified all of the join columns for the two data objects.

  10. In the Join configuration pane, choose the kind of join to apply. If the join fields are a unique key for one or both tables, enable the unique key setting. Unique keys only apply to direct queries, not to SPICE data.

    For more information about joins, see Join types.

  11. Choose Apply to confirm your choices.

    To cancel without making changes, choose Cancel.

  12. The join icon in the workspace changes to show the new relationship.

  13. (Optional) In the Fields section, you can use each field's menu to do one or more of the following:

    • Add a hierarchy to a geospatial field.

    • Include or Exclude the field.

    • Edit name & description of the field.

    • Change data type.

    • Add a calculation (a calculated field).

    • Restrict access to only me, so only you can see it. This can be helpful when you are adding fields to a dataset that's already in use.

  14. (Optional) In the Filters section, you can add or edit filters. For more information, see Filtering data in Amazon QuickSight.

Join types

Amazon QuickSight supports the following join types:

  • Inner joins

  • Left and right outer joins

  • Full outer joins

Let's take a closer look at what these join types do with your data. For our example data, we're using the following tables named widget and safety rating.

SELECT * FROM safety-rating rating_id safety_rating 1 A+ 2 A 3 A- 4 B+ 5 B SELECT * FROM WIDGET widget_id widget safety_rating_id 1 WidgetA 3 2 WidgetB 1 3 WidgetC 1 4 WidgetD 2 5 WidgetE 6 WidgetF 5 7 WidgetG

Inner joins

Use an inner join ( ) when you want to see only the data where there is a match between two tables. For example, suppose that you perform an inner join on the safety-rating and widget tables.

In the following result set, widgets without safety ratings are removed, and safety ratings without associated widgets are removed. Only the rows that match perfectly are included.

SELECT * FROM safety-rating INNER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 B 6 WidgetF 5

Left and right outer joins

These are also known as left or right outer joins. Use a left ( ) or right ( ) outer join when you want to see all the data from one table, and only the matching rows from the other table.

In a graphical interface, you can see which table is on the right or the left. In a SQL statement, the first table is considered to be on the left. Therefore, choosing a left outer join as opposed to a right outer join depends only on how the tables are laid out in your query tool.

For example, suppose that you perform a left outer join ( ) on safety-rating (the left table) and widgets (the right table). In this case, all safety-rating rows are returned, and only matching widget rows are returned. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating LEFT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5

If you instead use a right outer join ( ), call the tables in the same order so safety-rating is on the left and widgets is on the right. In this case, only matching safety-rating rows are returned, and all widget rows are returned. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating RIGHT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 WidgetE 5 B 6 WidgetF 5 7 WidgetG

Full outer joins

These are sometimes called just outer joins, but this term can refer to either a left outer, right outer, or full outer join. To define the meaning, we use the complete name: full outer join.

Use a full outer join ( ) to see data that matches, plus data from both tables that doesn't match. This type of join includes all rows from both tables. For example, if you perform a full outer join on the safety-rating and widget tables, all rows are returned. The rows are aligned where they matched, and all extra data is included on separate rows. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating FULL OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5 5 WidgetE 7 WidgetG