Using Databricks in QuickSight - Amazon QuickSight

Using Databricks in QuickSight

Use this section to learn how to connect from QuickSight to Databricks.

To connect to Databricks
  1. Begin by creating a new dataset. Choose Datasets from the navigation pane at left, then choose New Dataset.

  2. Choose the Databricks data source card.

  3. For Data source name, enter a descriptive name for your Databricks data source connection, for example Databricks CS. Because you can create many datasets from a connection to Databricks, it's best to keep the name simple.

    The following screenshot shows the connection screen for Databricks.

         				An image of a screen for a new Databricks data source. 
         				    It shows all of the fields described in this section.
         				    The blue button to create data source, at bottom right,
         				    can be activated by pressing ENTER.
  4. For Connection type, select the type of network you're using.

    • Public network – if your data is shared publicly.

    • VPC – if your data is inside a VPC.


    If you're using VPC, and you don't see it listed, check with your administrator.

  5. For Database server, enter the Hostname of workspace specified in your Databricks connection details.

  6. For HTTP Path, enter the Partial URL for the spark instance specified in your Databricks connection details.

  7. For Port, enter the port specified in your Databricks connection details.

  8. For Username and Password, enter your connection credentials.

  9. To verify the connection is working, click Validate connection.

  10. To finish and create the data source, click Create data source.

Adding a new QuickSight dataset for Databricks

After you have an existing data source connection for Databricks data, you can create Databricks datasets to use for analysis.

To create a dataset using Databricks
  1. Choose Datasets at left, then scroll down to find the data source card for your Databricks connection. If you have many data sources, you can use the search bar at the top of the page to find your data source with a partial match on the name.

  2. Choose the Databricks data source card, and then choose Create data set. The following popup displays:

                            An image of a screen entitled Choose your
                                    table. It shows the fields described in this section.
                                There are two buttons at bottom left: one to edit and preview the
                                data, and the other to use custom SQL. The blue button to select the
                                table, at bottom right, can be activated by pressing
  3. To specify the table you want to connect to, first select the Catalog and Schema you want to use. Then, for Tables, select the table that you want to use. If you prefer to use your own SQL statement, select Use custom SQL.

  4. Choose Edit/Preview.

  5. (Optional) To add more data, use the following steps:

    1. Choose Add data at top right.

    2. To connect to different data, choose Switch data source, and choose a different dataset.

    3. Follow the UI prompts to finish adding data.

    4. After adding new data to the same dataset, choose Configure this join (the two red dots). Set up a join for each additional table.

    5. If you want to add calculated fields, choose Add calculated field.

    6. To add a model from SageMaker, choose Augment with SageMaker. This option is only available in QuickSight Enterprise edition.

    7. Clear the check box for any fields that you want to omit.

    8. Update any data types that you want to change.

  6. When you are done, choose Save to save and close the dataset.