- Published on
How to securely query your AWS RDS database from PowerBI
Recently I worked with a client with a Data Analytics team using the data visualisation tool PowerBI. PowerBI is a Business Intelligence desktop application for Microsoft Windows that allows users to generate graphics and reports for a variety of data sources.
The client wished to generate reports on the data in their production PostgreSQL database that we provisioned for them in AWS RDS. They therefore required a solution that would allow their Analytics team to query the database from their local machines.
The solution had to balance of number of priorities:
- Security - The production data was highly sensitive and it was important to not risk a data breach.
- Convenience - Ideally the solution would require minimal effort on the side of the PowerBI team to implement on their local machines.
- Speed of implementation - We had a two day goal for the integration to be completed.
- Cost - AWS bill should not increase too much.
I found that provisioning an AWS Client VPN for the PowerBI team to connect to and query the database through was the ideal balance of these four priorities.
Decision Process
The AWS whitepaper on the topic of using PowerBI in the AWS Cloud provides four main options for connecting PowerBI and RDS, which are as follows:
Whitelist IP access for certain users
Pros
- ✅ Easy to set up
- ✅ Cheap
Cons
- ❌ Not very secure
❌ Requires manual maintenance of IP whitelist in AWS (annoying if IPs rotate, or some members of the team sometimes work from home)
Configure SSH access for certain users
Pros
- ✅ Pretty easy to set up
- ✅ Cheap
Cons
- ❌ Not very secure
❌ Users have to install an SSH client such as PuTTY which has a fairly complicated configuration process
Provision an AWS Client VPN endpoint that users can connect to
Pros
- ✅ Pretty secure
- ✅ Pretty easy to set up
- ✅ Fairly Cheap (70$/mo for associating with a single subnet)
Cons
- ❌ Users have to install VPN client and connect while using PowerBI
Provision a Microsoft On-premises Data Gateway in AWS
Pros
- ✅ Pretty secure
- ✅ Users don’t have to install anything
Cons
- ❌ Very expensive (~$400/mo)
- ❌ Difficult to set up
From the above analysis I concluded that provisioning an AWS Client VPN satisfied three out of our four priorities. I mitigated the downside of VPN configuration on the client side by providing good documentation to the PowerBI users for connecting to the VPN.
Provisioning an AWS Client VPN using Terraform
Ideally PowerBI users should not be querying your actual production database. This is in case any of them manage to write an expensive query and lock the database, causing downtime for your production users.
Therefore we have our first step:
Step 1: Provision a read replica of your database in RDS.
I am now in the habit of using Terraform to provision infrastructure in AWS which brings all the benefits of infrastructure as code, including versioning, automation and many more.
In Terraform provisioning a read replica is as easy as specifying another RDS resource block and setting the replicate_source_db
option to the id of your production database.
Step 2: Provision an AWS Client VPN
To use mutual authentication two certificates must be defined in Terraform:
resource "aws_acm_certificate" "server" {
domain_name = "server"
}
resource "aws_acm_certificate" "client" {
domain_name = "client1.domain.tld"
}
We then define a aws_ec2_client_vpn_endpoint
resource:
resource "aws_ec2_client_vpn_endpoint" "vpn" {
description = "PowerBI VPN"
client_cidr_block = "10.0.0.0/22"
split_tunnel = true
server_certificate_arn = aws_acm_certificate.server.arn
connection_log_options {
enabled = false
}
authentication_options {
type = "certificate-authentication"
root_certificate_chain_arn = aws_acm_certificate.client.arn
}
dns_servers = [
"1.1.1.1",
"8.8.8.8",
]
}
Where we use split_tunnel
to ensure only RDS traffic gets routed through the VPN. We don’t want to pay the data transfer costs for all other traffic!
We also need an authorisation rule:
resource "aws_ec2_client_vpn_authorization_rule" "default" {
client_vpn_endpoint_id = aws_ec2_client_vpn_endpoint.vpn.id
target_network_cidr = "10.0.0.0/16"
authorize_all_groups = true
}
And finally a network association:
resource "aws_ec2_client_vpn_network_association" "default" {
client_vpn_endpoint_id = aws_ec2_client_vpn_endpoint.vpn.id
subnet_id = var.vpc_private_subnet
security_groups = [var.security_group_id]
}
Make sure the subnet_id
is the subnet containing your RDS instance! Also, make sure to allow traffic from the security group you specify in security_groups
as ingress to your production database security group. A secure implementation will only allow ingress via the database port e.g. for Postgres it would be as follows:
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [var.security_group_id]
}
Step 3: Create a VPN Configuration File
Steps to do this can be found in the AWS documentation (step 7).
Step 4: Download a VPN Client and connect to the VPN
AWS provides a VPN client for Windows, Mac and Linux. Once downloaded you should be able to connect to your VPN!
Step 5: On Windows, add the RDS Public Key to our trusted root certificates
- Download the RDS Public Key
- Convert the key to P7B/PKCS#7
- Open
mmc
from the cmd prompt - File > Add/Remove Snap In
- Certificates > Add > Computer Account > Local Computer > OK
- Right click on Trusted Root Certification Authorities > All Tasks > Import
- Select the certificate you just converted > Place all certificates in the following store
Step 6: Add your RDS Database as a data source in PowerBI
- Obtain the RDS credentials
- In PowerBI Get Data > PostgreSQL Database (or your DB type)
- Fill in the credentials
- Query away!
Make sure you share the database credentials and VPN configuration file securely between users. Now you are all set up!