Joining Data - Amazon QuickSight

Joining Data

You can use the join interface in Amazon QuickSight to join objects from one or more data sources. By using Amazon QuickSight to join the data, you can merge disparate data without duplicating the data from different sources.

The following limitations apply to joins:

  • For joins of multiple data sources, the result set is a SPICE dataset.

  • For joins of multiple data sources, there's no size restriction on the data that you use to create the dataset. Make sure to always begin the join with your largest table. Often, this is a fact table. The rest of the tables combined must total less than 1 GB in size.

  • You can't join on calculated fields that you created in Amazon QuickSight.

  • You can't join on fields that use the geospatial data type.

For some examples of joining tables across data sources, see the Joining across data sources on Amazon QuickSight post on the AWS Big Data Blog.

Creating a Join

Use the following procedure to join tables to use in a dataset. Before you begin, import or connect to your data. You can create a join between any of the data sources supported by Amazon QuickSight, except Internet of Things (IoT) data. For example, you can add comma-separated value (.csv) files, tables, views, SQL queries, or JSON objects in an Amazon S3 bucket.

To add one or more joins

  1. Open the dataset that you want to work with.

  2. (Optional) Before you get started, decide if you want to disable the autogenerated preview based on of a sample of your data. To disable it, choose Auto-preview at top right. It's enabled by default.

  3. If you haven't already done so, choose a Query mode. You can choose SPICE to store your dataset in SPICE or choose Direct query to pull live data every time. If your dataset contains one ore more manually uploaded file, your dataset is automatically stored in SPICE.

    • If you choose SPICE, the data is ingested into QuickSight, and visuals that use the dataset run queries in SPICE, instead of on the database.

    • If you choose Direct query, the data isn't ingested into SPICE, and visuals that use the dataset run queries on the database, instead of in SPICE.

      If you choose this option, make sure to set unique keys in the join, if applicable, to improve performance when loading visuals.

  4. On the data preparation page, choose the button labeled Add data.

  5. Follow the interface prompts to choose the applicable settings from the following:

    • To add data from the same data source, choose a schema and one or more tables. The options that display can vary depending on your data source.

    • To add data from a different data source, choose Switch data source and follow the screen prompts.

    • To add a new file, choose Upload a file.

    • To write a query for a SQL data source, open the query editor by choosing Use custom SQL.

    • To create self-joins, add a table multiple times. A counter appears after the name. An example is Product, Product (2), and Product (3). Field names in the Fields or Filters sections include the same counter so you can know which instance of the table a field came from.

    After you add data, you can interact with each table by choosing its menu icon. You can rearrange the tables by dragging and dropping them. An icon with red dots appears to indicate that you need to configure this join.

  6. Two red dots appear for joins that aren't yet configured. To create joins, choose the first join configuration icon.

  7. (Optional) To change an existing join, reopen Join configuration by choosing the join icon between two tables.

  8. The Join Configuration pane opens. On the join interface, specify the join type and the fields to use to join the tables.

  9. At the bottom of the screen, you can see options to set a field in one table equal to a field in another table.

    1. In the Join clauses section, choose the join column for each table.

      (Optional) If the tables that you selected join on multiple columns, choose Add a new join clause. Doing this adds another row to the join clauses, so you can specify the next set of columns to join. Repeat this process until you have identified all of the join columns for the two data objects.

  10. In the Join configuration pane, choose the kind of join to apply. If the join fields are a unique key for one or both tables, enable the unique key setting. Unique keys only apply to direct queries, not to SPICE data.

    For more information about joins, see Join Types.

  11. Choose Apply to confirm your choices.

    To cancel without making changes, choose Cancel.

  12. The join icon in the workspace changes to show the new relationship.

  13. (Optional) In the Fields section, you can use each field's menu to do one or more of the following:

    • Add a hierarchy to a geospatial field.

    • Include or Exclude the field.

    • Edit name & description of the field.

    • Change data type.

    • Add a calculation (a calculated field).

    • Restrict access to only me, so only you can see it. This can be helpful when you are adding fields to a dataset that's already in use.

  14. (Optional) In the Filters section, you can add or edit filters. For more information, see Adding a Filter.

Join Types

Amazon QuickSight supports the following join types:

  • Inner joins

  • Left and right outer joins

  • Full outer joins

Let's take a closer look at what these join types do with your data. For our example data, we're using the following tables named widget and safety rating.

SELECT * FROM safety-rating rating_id safety_rating 1 A+ 2 A 3 A- 4 B+ 5 B SELECT * FROM WIDGET widget_id widget safety_rating_id 1 WidgetA 3 2 WidgetB 1 3 WidgetC 1 4 WidgetD 2 5 WidgetE 6 WidgetF 5 7 WidgetG

Inner Joins

Use an inner join ( ) when you want to see only the data where there is a match between two tables. For example, suppose that you perform an inner join on the safety-rating and widget tables.

In the following result set, widgets without safety ratings are removed, and safety ratings without associated widgets are removed. Only the rows that match perfectly are included.

SELECT * FROM safety-rating INNER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 B 6 WidgetF 5

Left and Right Outer Joins

These are also known as left or right outer joins. Use a left ( ) or right ( ) outer join when you want to see all the data from one table, and only the matching rows from the other table.

In a graphical interface, you can see which table is on the right or the left. In a SQL statement, the first table is considered to be on the left. Therefore, choosing a left outer join as opposed to a right outer join depends only on how the tables are laid out in your query tool.

For example, suppose that you perform a left outer join ( ) on safety-rating (the left table) and widgets (the right table). In this case, all safety-rating rows are returned, and only matching widget rows are returned. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating LEFT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5

If you instead use a right outer join ( ), call the tables in the same order so safety-rating is on the left and widgets is on the right. In this case, only matching safety-rating rows are returned, and all widget rows are returned. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating RIGHT OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 3 A- 1 WidgetA 3 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 5 WidgetE 5 B 6 WidgetF 5 7 WidgetG

Full Outer Joins

These are sometimes called just outer joins, but this term can refer to either a left outer, right outer, or full outer join. To define the meaning, we use the complete name: full outer join.

Use a full outer join ( ) to see data that matches, plus data from both tables that doesn't match. This type of join includes all rows from both tables. For example, if you perform a full outer join on the safety-rating and widget tables, all rows are returned. The rows are aligned where they matched, and all extra data is included on separate rows. You can see blanks in the result set where there is no matching data.

SELECT * FROM safety-rating FULL OUTER JOIN widget ON safety_rating.rating_id = widget.safety_rating_id rating_id safety_rating widget_id widget safety_rating_id 1 A+ 2 WidgetB 1 1 A+ 3 WidgetC 1 2 A 4 WidgetD 2 3 A- 1 WidgetA 3 4 B+ 5 B 6 WidgetF 5 5 WidgetE 7 WidgetG