Appendix: Microsoft Power BI supported AWS data sources - Using Microsoft Power BI with the AWS Cloud

Appendix: Microsoft Power BI supported AWS data sources

The full list of supported data sources is provided by Microsoft (refer to Power BI data sources); however, the following sections for each AWS data source provide usage and configuration guidance that may be helpful for some readers.

Amazon Redshift

Amazon Redshift is a fully-managed, petabyte-scale data warehouse service in the AWS Cloud. An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.

You should consider using Amazon Redshift when:

  • You are building or migrating to a native cloud data warehouse.

  • You might need to scale from a few to hundreds of Terabytes.

  • You want to allow the Power BI users to transparently access data from the data lake stored in Amazon S3 and join it with tables in the data warehouse.

  • Your query workload includes:

    • Queries which compute aggregation on large (multi-gigabyte and multi-terabyte) tables.

    • Extremely complex SQL with multiple joins and sub-queries.

    • A mix of complex analytical queries and simple, highly-filtered queries used in Dashboards.

When using Amazon Redshift with Microsoft Power BI, keep the following points in mind:

  • Amazon Redshift is natively-supported as a Power BI data source in both Microsoft Power BI Desktop and Power BI services, and each supports import and direct query modes.

  • While a Redshift cluster can be launched in a public subnet and configured to allow access from the internet, the majority of customers prefer to launch it in a private subnet to increase security. When using a private subnet, make use of the on-premises data gateway to connect from the Power BI service to Amazon Redshift.

  • The Redshift connector supports Azure AD Authentication in Power BI Desktop and service.

  • External tables accessed via Spectrum are not treated any differently than native Redshift tables, and Power BI has no means to differentiate them. When accessing data in external tables, make sure that:

    • Columns that contain strings of characters are catalogued as ‘VARCHAR’ in the AWS Glue Data Catalog and not as ‘STRING’, otherwise Power BI will throw the following error: Exception: OLE DB or ODBC error: [Expression.Error] We couldn't fold the expression to the data source. Please try a simpler expression..

    • Columns containing complex data types such as ARRAY, are not supported. When columns containing complex data types are used, Power BI will throw the following error: Exception: ODBC: ERROR [42703] [Microsoft]Amazon Redshift Error occurred while trying to execute a query

      If you need to include them in your model, you can either enable (in Amazon Redshift) the JSON serialization at the user level or store the complex data types in a SUPER column in a native table.

Amazon RDS

Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. Amazon RDS is available on several database instance types (optimized for memory, performance, or I/O) and provides you with six familiar database engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server.

You should consider using RDS when:

  • You are building an operational data store.

  • You are migrating SQL Server or Oracle Database data warehouse to the cloud but not interested in refactoring.

  • Your query workload includes:

    • Queries which access highly-filtered data on tables that can be easily indexed.

    • Analytics queries on small-to-medium sized tables (gigabytes).

    • A mix of medium-complexity analytical queries and simple, highly-filtered queries used in Dashboards.

When using Amazon RDS with Microsoft Power BI, keep the following points in mind:

  • Amazon RDS provides multiple database engines including SQL Server, MariaDB, MySQL, Oracle Database, and PostgreSQL. Note that the database engines are listed in Power BI Desktop and Power BI service, not the Amazon RDS service.

  • For Amazon Aurora, use the My SQL or PostgreSQL connection type, depending on your selected database engine.

  • While an Amazon RDS instance can be launched in a public subnet and configured to allow access from the internet, the majority of customers prefer to launch it in a private subnet to increase security. When using a private subnet make use of the on-premises data gateway to connect from the Power BI service to RDS.

  • With Amazon RDS, you can deploy multiple editions of SQL Server (2012, 2014, 2016, 2017, and 2019) including Express, Web, Standard, and Enterprise.

Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas, populate your Data Catalog with new and modified table and partition definitions, and maintain schema versioning.

You should consider Athena as a data source when:

  • You want to query your data lake directly.

  • Your query workload includes:

    • Queries which compute aggregation on large (multi-gigabyte and multi-terabyte) tables

    • Interactive ad hoc SQL, for exploratory purposes.

When using Amazon Athena with Microsoft Power BI, keep the following points in mind:

  • With the July 2021 release of Microsoft Power BI, a Microsoft-certified connector has been introduced for Amazon Athena. You can use the Microsoft Power BI connector for Amazon Athena to analyze data from Amazon Athena in Microsoft Power BI Desktop. After you publish content to the Power BI service, you can use the Microsoft on-premises data gateway to keep the content up to date through on-demand or scheduled refreshes.

  • The Microsoft Power BI connector for Amazon Athena supports both Import and Direct Query data connectivity modes. With the Import mode, selected tables and columns are imported into Power BI Desktop for querying. With the Direct Query mode, no data is imported or copied into Power BI Desktop, and instead Power BI Desktop queries the underlying data source directly.

  • For more information on the Microsoft Power BI connector for Amazon Athena, refer to Using the Amazon Athena Power BI Connector.

  • Note that the Microsoft Power BI connector for Amazon Athena requires the use of the Amazon Athena ODBC driver and a valid ODBC DSN configuration on your system to query Amazon Athena. To download the latest ODBC driver and for configuration information, refer to Connecting to Amazon Athena with ODBC.

  • For a tutorial on the configuration steps and best practices when using the Microsoft Power BI connector for Amazon Athena, refer to Creating dashboards quickly on Microsoft Power BI using Amazon Athena.

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service)

You can use SQL to query your Amazon OpenSearch Service, rather than using the JSON-based search query DSL. Querying with SQL is useful if you're already familiar with the language or want to integrate your domain with an application that uses it, such as Microsoft Power BI.

You should consider Amazon OpenSearch Service as a data source when:

  • You have semi-structured data such as log files or JSON output, and need to search, analyze, or visualize the information quickly.

When using Amazon OpenSearch Service with Microsoft Power BI, keep the following points in mind:

AWS Lake Formation

Lake Formation helps you collect and catalog data from databases and object storage, move the data into your new Amazon S3 data lake, clean and classify your data using machine learning algorithms, and secure access to your sensitive data. Your users can access a centralized data catalog which describes available data sets and their appropriate usage. Your users then utilize these data sets with their choice of analytics and machine learning services, like Amazon Redshift, Amazon Athena, and (in beta) Amazon EMR for Apache Spark. Lake Formation builds on the capabilities available in AWS Glue.

You should consider Lake Formation if you need fine-grained (row and column) level access to your data lake instead of the traditional IAM based controls.

When using Lake Formation with Microsoft Power BI, keep the following points in mind:

  • To query data from the Lake Formation Data Catalog with Power BI Desktop or Power BI service, use the same process and configuration as querying data in Athena. If you are making use of the Lake Formation permission model, ensure that the ODBC DSN configuration for Amazon Athena has the “LakeformationEnabled” property key set to a value of “true”. This tells the Amazon Athena ODBC driver to use the Lake Formation service for authorization, instead of AWS Security Token Service directly. For more information, refer to the documentation for Connecting to Amazon Athena with ODBC.

  • The "Use only IAM access control" setting enabled for compatibility with existing Data Catalog behavior will provide full compatibility.

  • Upgrading AWS Glue Data Permissions to the Lake Formation Model may introduce incompatibilities and should tested before use. Preliminary testing indicates that column level grant or deny are being honored, but row and cell level filtering have not been tested by the authors, as this is still in preview and subject to change.