Security and compliance - Machine Learning Best Practices for Public Sector Organizations

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Security and compliance

Public sector organizations have a number of security challenges and concerns with hosting ML workloads in the cloud as these applications can contain sensitive customer data – this includes personal information or proprietary information that must be protected over the entire data lifecycle. The specific concerns also include protecting the network and underlying resources such as compute, storage and databases; user authentication and authorization; logging, monitoring and auditing. These objectives are summarized in Figure 6 below.

Diagram showing Security and Compliance objectives for hosting public sector ML workloads

Figure 6: Security and Compliance objectives for hosting public sector ML workloads

This subsection provides best practices and guidelines to address some of these security and compliance challenges.

Compute and network isolation

One of the major requirements with many public sector ML projects is the ability to keep the environments, data and workloads secure and isolated from internet access. These can be achieved using the following methods:

  • Provision ML components in an isolated VPC with no internet access: SageMaker components including the studio, notebooks, training jobs and hosting instances can be provisioned in an isolated VPC with no internet access. Traffic can be restricted from accessing the internet by launching SageMaker Studio in a Virtual Private Cloud (VPC) of choice. This allows fine-grained control of the network access and internet connectivity of SageMaker Studio notebooks. Direct internet access can be disabled to add an additional layer of security.

To disable direct internet access, specify the VPC only network access type when onboarding to Studio. The same concept can be applied to SageMaker notebooks by choosing to launch the notebook instance in a VPC to restrict which traffic can go through the public Internet. When launched with the VPC attached, the notebook instance can be configured either with or without direct internet access. Traffic to public endpoints such as S3 or SageMaker APIs can be configured to traverse over VPC endpoints to ensure that the traffic stays within the AWS network. Please refer to Building secure ML environments with Amazon SageMaker AI for further details.

  • Use VPC end-point and end-point policies to further limit access: AWS resources can be directly connected with public endpoints such as S3, CloudWatch, and SageMaker API / SageMaker Runtime through an interface endpoint in the VPC instead of connecting over the internet. When a VPC interface endpoint is used, communication between the VPC and the SageMaker API or Runtime is entirely and securely within the AWS network. VPC endpoint policies can be configured to further limit access based on who can perform actions, what actions can be performed, and the resources on which these actions can be performed. As an example, access to an S3 bucket can be restricted only to a specific SageMaker studio domain or set of users, and each studio domain can be restricted to have access only to a specific S3 bucket (see Securing Amazon SageMaker AI Studio connectivity using a private VPC, which outlines how to secure SageMaker studio connectivity using a private VPC). Figure 7 below outlines an architecture diagram that represents how to set up SageMaker studio using a private VPC.

Diagram showing SageMaker Studio in a private VPC

Figure 7: SageMaker Studio in a private VPC

  • Allow access from only within the VPC: An IAM policy can be created to prevent users outside the VPC from accessing SageMaker Studio or SageMaker notebooks over the internet. This ensures access to only connections made from within the VPC. As an example, this policy can help restrict connections made only through specific VPC endpoints or a specific set of source IP addresses. This policy can be added to every user, group, or role used to access Studio or Jupyter notebooks.

  • Intrusion detection and prevention: AWS Gateway Load Balancer (GWLB) can be used to deploy, scale, and manage the availability of third-party virtual appliances such as firewallsintrusion detection and prevention systems, and deep packet inspection systems in the cloud. GWLB allows custom logic or third party offering into any networking path for AWS where inspection is needed and the corresponding action is taken on packets. For example, a simple application can be developed to check if there is any unencrypted traffic or TLS1.0/TLS1.1 traffic between VPCs. Additionally, AWS Partner Network and AWS Marketplace partners can offer their virtual appliances as a service to AWS customers without having to solve the complex problems of scale, availability, and service delivery. Please refer to Introducing AWS Gateway Load Balancer – Easy Deployment, Scalability, and High Availability for Partner Appliances for further details on GWLB.

  • Additional security to allow access to resources outside your VPC: If access is needed to an AWS service that does not support interface VPC endpoints, or to a resource outside of AWS, a NAT gateway needs to be created and security groups need to be configured to allow outbound connections. Additionally, AWS Network Firewall can be used to filter outbound traffic, for example, to specific GitHub repositories. AWS Network Firewall supports inbound and outbound web filtering for unencrypted web traffic. For encrypted web traffic, Server Name Indication (SNI) is used for blocking access to specific sites. In addition, AWS Network Firewall can filter fully qualified domain names (FQDN).

Data Protection

  • Protect data at rest: AWS Key Management service (KMS) can be used to encrypt ML data, studio notebooks and SageMaker notebook instances. SageMaker uses KMS keys (formerly CMKs) by default. KMS keys can be used to get more control on encryption and key management. For studio notebooks, the ML-related data is primarily stored in multiple locations. An S3 bucket hosts notebook snapshots and metadata, EFS volumes contain studio notebook and data files, and EBS volumes are attached to the instance that the notebook runs on. KMS can be used for encrypting all these storage locations. Encryption keys can be specified to encrypt the volumes of all Amazon EC2-based SageMaker resources, such as processing jobs, notebooks, training jobs, and model endpoints. FIPS endpoints can be used if FIPS 140-2 validated cryptographic modules are required to access AWS through a command line interface or an API.

  • Protect data in transit: To protect data in transit, AWS makes extensive use of HTTPS communication for its APIs. Requests to the SageMaker API and console are made over a secure (SSL) connection. In addition to passing all API calls through a TLS-encrypted channel, AWS APIs also require that requests are signed using the Signature Version 4 signing process. This process uses client access keys to sign every API request, adding authentication information as well as preventing tampering of the request in flight. Additionally, communication between instances in a distributed training job can be further protected and another level of security can be added to protect your training containers and data by configuring a private VPC. SageMaker can be instructed to encrypt inter-node communication automatically for the training job. The data passed between nodes is then passed over an encrypted tunnel without the algorithm having to take on responsibility for encrypting and decrypting the data.
  • Secure shared notebook instances: SageMaker notebook instances are designed to work best for individual users. They give data scientists and other users the most power for managing their development environment. A notebook instance user has root access for installing packages and other pertinent software. The recommended best practice is to use IAM policies when granting individuals access to notebook instances that are attached to a VPC that contains sensitive information. For example, allow only specific users access to a notebook instance with an IAM policy.

Authentication and Authorization

AWS IAM enables control of access to AWS resources. IAM administrators control who can be authenticated (signed in) and authorized (have permissions) to use SageMaker resources. IAM can help create preventive controls for many aspects of your ML environment, including access to Amazon SageMaker AI resources, data in Amazon S3, and API endpoints. AWS services can be accessed using a RESTful API, and every API call is authorized by IAM. Explicit permissions can be granted through IAM policy documents, which specify the principal (who), the actions (API calls), and the resources (such as Amazon S3 objects) that are allowed, as well as the conditions under which the access is granted. Access can be controlled by creating policies and attaching them to IAM identities or AWS resources. A policy is an object in AWS that, when associated with an identity or resource, defines their permissions. Two common ways to implement least privilege access to the SageMaker environments are identity-based policies and resource-based policies:

  • Identity-based policies are attached to a user, group, or role. These policies specify what that identity can do. For example, by attaching the AmazonSageMakerFullAccess managed policy to an IAM role for data scientists, they are granted full access to the SageMaker service for model development work.

  • Resource-based policies are attached to a resource. These policies specify who has access to the resource, and what actions can be performed on it. For example, a policy can be attached to an Amazon Simple Storage Service (Amazon S3) bucket, granting read-only permissions to data scientists accessing the bucket from a specific VPC endpoint. Another typical policy configuration for S3 buckets is to deny public access, to prevent unauthorized access to data.

Please refer to Configuring Amazon SageMaker AI Studio for teams and groups with complete resource isolation, which outlines how to configure access control for teams or groups within Amazon SageMaker AI Studio using attribute-based access control (ABAC). ABAC is a powerful approach that can be utilized to configure Studio so that different ML and data science teams have complete isolation of team resources. AWS Single Sign-On (AWS SSO) can also be used for user authentication with an external identity provider such as Ping identity or Okta. Please refer to Onboarding Amazon SageMaker AI Studio with AWS SSO and Okta Universal Directory, which outlines how to onboard SageMaker Studio with SSO and Okta universal directory.

Artifact and model management

The recommended best practice is to use version control to track code or other model artifacts. If model artifacts are modified or deleted, either accidentally or deliberately, version control allows you to roll back to a previous stable release. This can be used in cases where an unauthorized user gains access to the environment and makes changes to the model. If model artifacts are stored in Amazon S3, versioning should be enabled. S3 versioning should also be paired with multi-factor authentication (MFA) delete, to help ensure that only users authenticated with MFA can permanently delete an object version, or change the versioning state of the bucket. Another way of enabling version control is to associate Git repositories with new or existing SageMaker notebook instances. SageMaker supports AWS CodeCommit, GitHub, and other Git-based repositories. Using CodeCommit, repository can be further secured by rotating credentials and enabling MFA.

Additionally, the SageMaker Model registry can also be used to register, deploy, and manage models as discussed in SageMaker Pipelines in the MLOps section earlier.

Security compliance

Third-party auditors assess the security and compliance of Amazon SageMaker AI as part of multiple AWS compliance programs including FedRAMP, HIPAA, and others. For a list of AWS services in scope of specific compliance programs, see AWS Services in Scope by Compliance Program. Third-party audit reports can be downloaded using AWS Artifact. The customer’s compliance responsibility when using Amazon SageMaker AI is determined by the sensitivity of the Organization’s data, its compliance objectives, and applicable laws and regulations. AWS provides the following resources to help with compliance: