Connecting the Microsoft Power BI service to AWS data sources
Microsoft Power BI service (SaaS) can be connected directly to internet-accessible data sources, or to private data sources in an Amazon VPC. Connection to private data sources requires an application component called Microsoft on-premises data gateway. The Microsoft on-premises data gateway is downloaded and installed on an Amazon EC2 instance in the VPC and configured with Microsoft Power BI credentials. The gateway establishes an outbound connection to the Microsoft Azure Service Bus over the internet, and is configured in Microsoft Power BI to connect to data sources that it can access. Larger deployments can use multiple on-premises data gateways to balance load or increase fault tolerance.
Use of the Microsoft on-premises data gateway provides a number of substantial benefits which have been reported by AWS customers:
-
Improved security posture: The Microsoft on-premises data gateway
does not accept inbound connections from the Microsoft Azure Cloud, and only initiates outbound connections to the Azure Service Bus. This one-way traffic model allows you to keep your data sources private, and not expose them on the internet. -
Reduced data transfer out: When connecting to a data source, the Microsoft on-premises data gateway retrieves the entire result set and stores it locally in a process called spooling. Before the results are transmitted to the Power BI service the data is compressed. Users commonly report 10:1 compression ratios, which reduce not only the time to transmit the data across the internet, but reduce egress charges.
-
Reduced solution costs: When Microsoft on-premises data gateway is used, some of the data processing required by the service is performed by the gateway instead. Using Amazon EC2, in combination with cost-reduction plans like Savings Plans, or Reserved Instances, may help reduce your overall BI solution costs.
Recommended configuration
AWS recommends that you install the Microsoft on-premises data gateway on an Amazon EC2
instance in the private subnet that contains your data sources. This subnet is configured to
route requests to the internet via an Amazon VPC NAT gateway installed in a
public subnet. You can use a network address translation (NAT) gateway to enable instances in
a private subnet to connect to the internet or to other AWS services, but prevent the internet
from connecting to those instances. If you require a highly available data gateway
implementation, we recommend using a cluster of on-premises data gateways installed across
multiple EC2 instances that span different AWS Availability Zones. For information, see Add another gateway to create a cluster
The options presented in this section illustrate Amazon RDS, Amazon Redshift, and Amazon Athena. For a full discussion of all AWS data sources, refer to Appendix: Microsoft Power BI supported AWS data sources.
Connecting AWS data sources to the Microsoft Power BI service
Additional considerations
Table 5 — Considerations for Microsoft Power BI service with data sources in the AWS Cloud
Criteria | Considerations for Microsoft Power BI service with data sources in the AWS Cloud |
---|---|
Network connectivity |
Microsoft on-premises data gateway connectivity to data sources is straight forward because both the data consumer and the data sources reside within the AWS Cloud. Data sources that live in an Amazon VPC, such as Amazon RDS and Amazon Redshift, can be accessed directly. Data sources that use regional endpoints can be accessed through the Amazon VPC internet gateway, or by an Amazon VPC endpoint. Microsoft on-premises data gateway connectivity to the Microsoft Power BI service occurs over the internet and is an outbound connection only. |
Security |
IP access control
You can use a combination of routing and security groups to control access to data sources stored within the AWS Cloud. Because Microsoft on-premises data gateway is installed on an Amazon EC2 instance, it will have an associated security group that can be used to limit inbound access to the operating system. The gateway does not accept inbound requests. The instance does not need a public IP address, and should not be configured with one. Encryption in transit We recommend that data sources within an Amazon VPC are configured to use encryption for transmission of data. Regional services already make use of TLS encryption. Microsoft on-premises data gateway connectivity can be configured to connect to the Microsoft Azure Service Bus using HTTPS instead of TCP. We recommend using the HTTPS mode for communication. This is also the default for new gateway installations since the June 2019 gateway software version release. Authentication AWS recommends that you authenticate with AWS data sources using an identity that has read-only access only to the datasets required. The credentials that you enter for a data source are encrypted and stored in the gateway cloud service. The credentials are decrypted at the gateway on premises. (The credentials that you enter for a data source are encrypted and stored in the gateway cloud service.) Make sure that Microsoft Power BI credentials are securely controlled. Access to the service permits access to AWS data sources and potentially sensitive information they might contain. |
Performance | Microsoft on-premises data gateway in the AWS Cloud typically performs well due to the ability to size and scale up the Amazon EC2 instance. It also performs fast in Region networking and connectivity to the internet. |
Cost |
Three factors need to be considered: Amazon EC2 instance charges, data transfer charges, and Amazon NAT gateway charges. Size your Amazon EC2 instances according to Microsoft’s requirements Data transferred from the Microsoft on-premises data gateway to the Microsoft BI service incurs VPC egress charges. Customers report a 10:1 compression by using the data gateway which will reduce the amount of traffic, but we recommend that you limit queries and use filters to ensure that only relevant data is transferred. If the Microsoft on-premises data gateway connects to data sources in different Availability Zones or different AWS Regions, data transfer charges also apply. If the Microsoft on-premises data gateways are located in private subnets and
make use of an AWS NAT gateway, hourly and data processing charges apply. For more
information, see Amazon VPC pricing |