Joining Data - Amazon QuickSight

Joining Data

You can use the join interface in Amazon QuickSight to join objects from one or more data sources. By using Amazon QuickSight to join the data, you can merge disparate data without duplicating the data from different sources.

The following limitations apply to joins:

  • For joins of multiple data sources, the result set is a SPICE dataset.

  • For joins of multiple data sources, there's no size restriction on the data that you use to create the dataset. Make sure to always begin the join with your largest table. Often, this is a fact table. The rest of the tables combined must total less than 1 GB in size.

  • You can't join on calculated fields that you created in Amazon QuickSight.

  • You can't join on fields that use the geospatial data type.

Important

If you choose a table and make changes to the fields, these changes are discarded when you add tables by using the join interface. Examples of changes are changing a field name or adding a calculated field.

For some examples of joining tables across data sources, see the Joining across data sources on Amazon QuickSight post on the AWS Big Data Blog.

Creating a Join

Use the following procedure to join tables to use in a dataset. Before you begin, import or connect to your data.

  1. On the data preparation page, choose Add data near the top of the page.

    (Optional) If you're using a SQL-based data source, you can write a custom query by choosing Switch to custom SQL mode. Choosing this opens the following screen, where you can paste or enter a query.

    However, switching between modes erases your work in the other view. If you switch to custom SQL mode, choose Switch to diagram mode to switch back. The option to switch between SQL mode and diagram mode is available only when you are creating a new dataset. To switch modes for an existing dataset, create a new one instead.

    The following steps use the diagram mode.

    After you choose Add data, the following screen appears.

    By using the graphic editor, you can create a join between any of the data sources supported by Amazon QuickSight, except Internet of Things (IoT) data. For example, you can add comma-separated value (.csv) files, tables, views, SQL queries, or JSON objects in an Amazon S3 bucket.

  2. Choose Switch data source to use an existing data source, or choose Upload a file to add a new file.

    Then choose the item that you want to add. To search for it, enter text in the search box. Doing this searches the schema that you selected for the partial table name.

    When you add new data, you don't use a dataset that you created previously. Instead, you are adding a new instance of the data to the dataset that you are currently editing. When you choose Upload a file, you navigate to the file you want to add. If you choose Switch data source, you choose a data source you already have access to, but you don't have to enter credentials for it.

  3. Choose your data from the list, and then choose Select. To exit without adding this data, choose Cancel.

    After you add data, each item appears as a table icon in the workspace. You can drag and drop these icons to rearrange the tables. You can interact with each table by hovering the cursor over it until some interaction icons appear. To see information about the table, choose the information icon ( ). To remove the data from your dataset, choose the delete icon ( ).

    To create self-joins, add a table multiple times. To help you distinguish between two instances of the same table, a counter appears after the name. An example is Product, Product (2), and Product (3). Field names in the Fields or Filters sections include a label so you can see which instance of the table a particular field is from.

    An icon with red dots appears to indicate that you need to configure this join.

  4. After you add all your data objects, create joins by choosing each join configuration icon ( ). The icon opens the Join Configuration pane. On the join interface, specify the join type and the fields to use to join the tables.

  5. At the bottom of the screen, you can see options to set a field in one table equal to a field in another table.

    1. In the Join clauses section, choose the join column for each table.

      (Optional) If the tables that you selected join on multiple columns, choose Add a new join clause. Doing this adds another row to the join clauses, so you can specify the next set of columns to join. Repeat this process until you have identified all of the join columns for the two data objects.

  6. In the Join configuration pane, choose the kind of join to apply in the Join type section at right.

    To learn more about each join type, choose the join type. For more information about joins, see Join Types.

    If the join type that you choose isn't highlighted, check your selection by noting the descriptive text beneath the join type. Also, when you apply your changes, the join icon in the workspace changes to show the new relationship. To change it, choose the icon to reopen the Join configuration pane.

  7. Choose Apply to confirm your choices. To cancel without making changes, choose the X at top right of the Join configuration panel.

    The join icon updates to indicate that the join type and columns have been chosen.

  8. (Optional) Choose one or more options at left to help you configure your dataset:

    • For Data source, you can choose SPICE to store your dataset in SPICE, or choose Query to pull live data. If your dataset contains a file, your dataset is stored in SPICE.

      • If you choose SPICE, the data is ingested into SPICE, and visuals that use the dataset run queries in SPICE, instead of on the database.

      • If you choose Query, the data isn't ingested into SPICE, and visuals that use the dataset run queries on the database, instead of in SPICE.

    • In the Fields section, you can choose fields that you want to include or exclude.

    • In the Filters section, you can add or edit filters.

Join Types

Amazon QuickSight supports the following join types:

  • Inner joins

  • Left and right outer joins

  • Full outer joins

Let's take a closer look at what these join types do with your data. For our example data, we're using the following tables named widget and safety rating.

SELECT * FROM safety-rating rating_id safety_rating 1 A+ 2 A 3 A- 4 B+ 5 B SELECT * FROM WIDGET widget_id widget safety_rating_id 1 WidgetA 3 2 WidgetB 1 3 WidgetC 1 4 WidgetD 2 5 WidgetE 6 WidgetF 5 7 WidgetG

Inner Joins

Use an inner join ( ) when you want to see only the data where there is a match between two tables. For example, suppose that you perform an inner join on the safety-rating and widget tables.

In the following result set, widgets without safety ratings are removed, and safety ratings without associated widgets are removed. Only the rows that match perfectly are included.

SELECT * FROM safety-rating INNER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 B 6 WidgetF 5

Left and Right Outer Joins

These are also known as left or right outer joins. Use a left ( ) or right ( ) outer join when you want to see all the data from one table, and only the matching rows from the other table.

In a graphical interface, you can see which table is on the right or the left. In a SQL statement, the first table is considered to be on the left. Therefore, choosing a left outer join as opposed to a right outer join depends only on how the tables are laid out in your query tool.

For example, suppose that you perform a left outer join ( ) on safety-rating (the left table) and widgets (the right table). In this case, all safety-rating rows are returned, and only matching widget rows are returned. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating LEFT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5

If you instead use a right outer join ( ), call the tables in the same order so safety-rating is on the left and widgets is on the right. In this case, only matching safety-rating rows are returned, and all widget rows are returned. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating RIGHT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 WidgetE 5 B 6 WidgetF 5 7 WidgetG

Full Outer Joins

These are sometimes called just outer joins, but this term can refer to either a left outer, right outer, or full outer join. To define the meaning, we use the complete name: full outer join.

Use a full outer join ( ) to see data that matches, plus data from both tables that doesn't match. This type of join includes all rows from both tables. For example, if you perform a full outer join on the safety-rating and widget tables, all rows are returned. The rows are aligned where they matched, and all extra data is included on separate rows. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating FULL OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5 5 WidgetE 7 WidgetG