Security - Automated Data Analytics on AWS

Security

When you build systems on AWS infrastructure, security responsibilities are shared between you and AWS. This shared responsibility model reduces your operational burden because AWS operates, manages, and controls the components including the host operating system, the virtualization layer, and the physical security of the facilities in which the services operate. For more information about AWS security, visit AWS Cloud Security.

IAM roles

AWS Identity and Access Management (IAM) roles allow customers to assign granular access policies and permissions to services and users on the AWS Cloud. Automated Data Analytics on AWS creates IAM roles that grant the solution’s constructs to access Regional resources provisioned by the solution, such as:

  • IAM roles used by the Lambda functions that implements the APIs to read and write data in S3 buckets and DynamoDB tables or

  • IAM roles used by AWS Glue crawlers and jobs to read and write data in S3 buckets.

Though following the principle of least privilege on all IAM roles provisioned, due to the complexity of this solution, the Automated Data Analytics on AWS solution requires more exclusive control and access over the resources within the account it is deployed. It is recommended to deploy and operate this solution in its own dedicated AWS account instead of sharing the same AWS account with other cloud workloads.

IAM resource policies

AWS Identity and Access Management (IAM) resource-based policies are JSON policy documents that you attach to a resource such as an Amazon S3 bucket. These policies grant the specified principal permission to perform specific actions on that resource and defines under what conditions this applies. Automated Data Analytics on AWS supports granting the solution access to resources that are not provisioned by the solution (external resources), such as granting the solution read access to an Amazon S3 bucket to import source data. Automated Data Analytics on AWS uses session tagging to allow for external resource policies to manage granular access of the solution, groups, and users through PrincipalTag conditions.

Principal Policy Condition
Solution
"Condition": { "StringLike": { "aws:PrincipalTag/ada:service": "*" } }
Service (query or data-product)
"Condition": { "StringLike": { "aws:PrincipalTag/ada:service": "data-product" } }
Group
"Condition": { "ForAnyValue:StringLike": { "aws:PrincipalTag/ada:groups": [ "*:admin:*", "*:power-user:*" ] } }
User
"Condition": { "ForAnyValue:StringLike": { "aws:PrincipalTag/ada:user": [ "user-id-1", "user-id-2" ] } }

This is an example policy that grants power-user users read access to an Amazon S3 bucket for creating data products and delegated read access to the Amazon S3 bucket to Automated Data Analytics on AWS when querying the data product.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Grant Automated Data Analytics on AWS User access", "Effect": "Allow", "Action": "s3:Get*", "Resource": "arn:aws:s3:::<bucket-name>/*", "Principal": { "AWS": [ "arn:aws:iam::<ada-account>:root" ] }, "Condition": { "StringEquals": { "aws:PrincipalTag/ada:user": "<ada-user>" } } }, { "Sid": "Grant Automated Data Analytics on AWS Group access", "Effect": "Allow", "Action": "s3:Get*", "Resource": "arn:aws:s3:::<bucket-name>/*", "Principal": { "AWS": [ "arn:aws:iam::<ada-account>:root" ] }, "Condition": { "StringLike": { "aws:PrincipalTag/ada:groups": "*:power-user:*" } } }, { "Sid": "Grant Automated Data Analytics on AWS Group access in one of group", "Effect": "Allow", "Action": "s3:Get*", "Resource": "arn:aws:s3:::<bucket-name>/*", "Principal": { "AWS": [ "arn:aws:iam::<ada-account>:root" ] }, "Condition": { "ForAnyValue:StringLike": { "aws:PrincipalTag/ada:groups": [ "*:power-user:*", "*:some-custom-group:*" ] } } }, { "Sid": "Grant Automated Data Analytics on AWS Federated Query access", "Effect": "Allow", "Action": "s3:Get*", "Resource": "arn:aws:s3:::<bucket-name>/*", "Principal": { "AWS": [ "arn:aws:iam::<ada-account>:root" ] }, "Condition": { "StringEquals": { "aws:PrincipalTag/ada:service": "query" } } }, { "Sid": "Grant access from any Automated Data Analytics on AWS microservice", "Effect": "Allow", "Action": "s3:Get*", "Resource": "arn:aws:s3:::<bucket-name>/*", "Principal": { "AWS": [ "arn:aws:iam::<ada-account>:root" ] }, "Condition": { "StringEquals": { "aws:PrincipalTag/ada:service": "*" } } } ] }

Amazon Cognito

Automated Data Analytics on AWS uses Amazon Cognito user and identity pools. User pools are user directories that provide sign-in functionality for the web users. Identity pools provide AWS credentials to grant the web users access to other AWS services, such as the ability to access data stored in Amazon S3. After a successful user pool sign-in, Automated Data Analytics on AWS’s web solution UI receives user pool tokens from Amazon Cognito. These tokens are used to control access to server-side resources. For example, the API Gateway instance is configured with a Cognito authorizer that validates web requests for the presence of a proper token (for example, signed by the user pool and hasn’t expired).

This solution provisions a root_admin user when first deployed that is managed by the user pool and uses the Amazon Cognito Hosted UI to sign-in. All other users are managed by external identity providers configured by the root_admin through federated sign-in. The Cognito user pool handles the federated sign-in with the identity provider to authenticate users and return tokens based on user identity.

This solution also supports machine-to-machine access through Cognito app clients managed by the user pool to enable client credential flow.

Amazon CloudFront

This solution deploys a web user interface (UI) hosted in an Amazon S3 bucket. To help reduce latency and improve security, this solution deploys an Amazon CloudFront distribution with an origin access identity. For more information, refer to Restricting Access to Amazon S3 Content by Using an Origin Access Identity in the Amazon CloudFront Developer Guide.

This solution also deploys two APIs in the AWS APIGateway. One API is the REST API that provides access to ADA backend and the other API is an HTTP API that serves as the Athena Proxy for egress connection with third party analytics tools.

AWS WAF

This solution deploys AWS WAF, a web application firewall that helps protect the solution against common web exploits that might affect availability, compromise security, or consume excessive resources. AWS WAF provides control over how traffic reaches the solution, such as using security rules that block requests that don’t originate in an allow-list of CIDR IP range.

The solution supports specifying the allow-list of CIDR IP ranges for AWS Cloud Development Kit (AWS CDK) (AWS CDK) deployments to further secure all solution endpoints by restricting access. See CDK Deployment for more details on enabling AWS WAF IP allow-list.

Amazon API Gateway

You can access the REST APIs and HTTP APIs deployed by this solution from the web UI or via third party analytics tools to consume services and query data. Both API endpoints are protected by either Amazon Cognito authentication or the API key issued by the solution.

  • Use AWS WAF to activate an allow-list of IP range that API calls can originate from.

  • Use the API Gateway resource policy to create an allow-list of IP range that API calls can originate from.

  • Use mutual TLS authentication for API Gateway to allow calls from trusted parties only.

  • Configure the API to be private using Amazon VPC endpoint to limit access to callers within a particular VPC or on-premises connecting via Direct Connect or VPN.

  • Use IAM to restrict access to the API.