Best practices for AWS CloudHSM - AWS CloudHSM

Best practices for AWS CloudHSM

Perform the best practices in this topic to effectively use AWS CloudHSM.

Cluster management

Follow the best practices in this section when creating, accessing, and managing your AWS CloudHSM cluster.

Scale your cluster to handle peak traffic

Several factors can influence the maximum throughput that your cluster can handle, including client instance size, cluster size, network topography, and the cryptographic operations you require for your use case.

As a starting point, refer to the topic AWS CloudHSM Performance for performance estimates on common cluster sizes and configurations. We recommend you load test your cluster with the peak load you anticipate to determine whether your current architecture is resilient and at the right scale.

Architect your cluster for high availability

Add redundancy to account for maintenance: AWS may replace your HSM for scheduled maintenance or if it detects a problem. As a general rule, your cluster size should have at least +1 redundancy. For example, if you require two HSMs for your service to operate at peak times, your ideal cluster size will then be three. If you follow the best practices relating to availability, these HSM replacements should not impact your service. However, in-progress operations on the replaced HSM may fail and must be retried.

Spread your HSMs across many Availability Zones: Consider how your service will be able to operate during an Availability Zone outage. AWS recommends that you spread your HSMs across as many Availability Zones as possible. For a cluster with three HSMs, you should spread HSMs across three Availability Zones. Depending on your system, you may require additional redundancy.

Have at least three HSMs to ensure durability for newly generated keys

For applications that require durability of newly generated keys, we recommend having at least three HSMs spread across different Availability Zones in a region.

Secure access to your cluster

Use private subnets to limit access to your instance: Launch your HSMs and client instances in the private subnets of your VPC. This limits access to your HSMs from the outside world.

Use VPC endpoints to access APIs: The AWS CloudHSM data plane was designed to operate without needing access to the internet or AWS APIs. If your client instance requires access to the AWS CloudHSM API, you can use VPC endpoints to access the API without requiring internet access on your client instance. See AWS CloudHSM and VPC endpoints for more information.

Reconfigure SSL to secure client-server communication: AWS CloudHSM uses TLS to establish a connection to your HSM. After you have initialized your cluster, you can replace the default TLS certificate and key used to establish the outer TLS connection. For more information, see Improve your web server security with SSL/TLS offload in AWS CloudHSM.

Reduce costs by scaling to your needs

There are no upfront costs to use AWS CloudHSM. You pay an hourly fee for each HSM you launch until you terminate the HSM. If your service does not require continuous usage of AWS CloudHSM, you can reduce costs by scaling down (deleting) your HSMs to zero when they are not needed. When HSMs are again needed, you can restore your HSMs from a backup. If, for example, you have a workload requiring you to sign code once a month, specifically on the last day of the month, you can scale up your cluster before, scale it down by deleting your HSMs after the work is completed, and then restore your cluster to perform signing operations again at the end of the next month.

AWS CloudHSM automatically makes periodic backups of the HSMs in the cluster. When adding a new HSM at a later date, AWS CloudHSM will restore the latest backup onto the new HSM so that you can resume usage from the same place you left it. To calculate your AWS CloudHSM architecture costs, see AWS CloudHSM Pricing.

Related resources:

HSM user management

Follow the best practices in this section to effectively manage users in your AWS CloudHSM cluster. HSM users are distinct from IAM users. IAM users and entities that have an identity-based policy with the appropriate permissions can create HSMs by interacting with resources through the AWS API. After the HSM is created, you must use HSM user credentials to authenticate operations on the HSM. For a detailed guide of HSM users, see Managing HSM users in AWS CloudHSM.

Protect your HSM users' credentials

It is imperative to keep the credentials of your HSM users securely protected as HSM users are the entities that can access and perform cryptographic and management operations on your HSM. AWS CloudHSM does not have access to your HSM user credentials, and will be unable to assist you if you lose access to them.

Have at least two admins to prevent lockout

To avoid being locked out of your cluster, we recommend you have at least two admins in case one admin password is lost. In the event this happens, you can use the other admin to reset the password.

Note

Admins in Client SDK 5 are synonymous with crypto officers (COs) in Client SDK 3.

Enable quorum for all user management operations

Quorum allows you to set a min number of admins that must approve a user management operation before that operation can take place. Due to the privilege that admins have, we recommend that you enable quorum for all user management operations. This can limit the potential for impact if one of your admin passwords is compromised. For more information, see Managing Quorum.

Create multiple crypto users, each with limited permissions

By separating the responsibilities of crypto users, no one user has total control over the entire system. For this reason, we recommend you create multiple crypto users and limit the permissions of each. Typically, this is done by giving different crypto users distinctly different responsibilities and actions they perform (for example, having one crypto user who is responsible for generating and sharing keys with other crypto users who then utilize them in your application).

Related resources:

HSM key management

Follow the best practices in this section when managing keys in AWS CloudHSM.

Choose the right key type

When using a session key, your transactions per second (TPS) will be limited to one HSM where the key exists. Extra HSMs in your cluster will not increase the throughput of requests for that key. If you use a token key for the same application, your requests will be load balanced across all available HSMs in your cluster. For more information, see Key synchronization and durability settings in AWS CloudHSM.

Manage key storage limits

HSMs have limits on the maximum number of token and session keys that can be stored on an HSM at a single time. For information on key storage limits, see AWS CloudHSM quotas. If your application requires more than the limit, you can use one or more of the following strategies to effectively manage keys:

Use trusted wrapping to store your keys in an external data store: Using trusted key wrapping, you can overcome the key storage limit by storing all of your keys wrapped inside an external data store. When you are required to use this key, you can unwrap the key into the HSM as a session key, use the key for your required operation, and then discard the session key. The original key data remains safely stored in your data store for use whenever you need it. Using trusted keys to do this maximizes your protection.

Distribute keys across clusters: Another strategy for overcoming the key storage limit is storing your keys in multiple clusters. In this approach, you maintain a mapping of the keys that are stored in each cluster. Use this mapping to route your client requests to the cluster with the required key. For information on how to connect to multiple clusters from the same client application, see the following topics:

Managing and securing key wrapping

Keys may be marked either extractable or non-extractable through the EXTRACTABLE attribute. By default, HSM keys are marked as extractable.

Extractable keys are keys that are permitted to be exported from the HSM through key wrapping. Keys that are wrapped are encrypted, and must be unwrapped using the same wrapping key before they can be used. Non-extractable keys may not be exported from the HSM under any circumstance. There is no way to make a non-extractable key extractable. For this reason, it is important to consider whether you require your keys to be extractable or not and to set the corresponding key attribute accordingly.

If you require key wrapping in your application, you should utilize trusted key wrapping to limit the ability of your HSM users to only wrap/unwrap keys which have been explicitly marked as trusted by an admin. For more information, see topics on trusted key wrapping in Managing keys in AWS CloudHSM.

Related resources

Application integration

Follow the best practices in this section to optimize how your application integrates with your AWS CloudHSM cluster.

Bootstrap your Client SDK

Before your client SDK can connect to your cluster, it must be bootstrapped. When bootstrapping IP addresses to your cluster, we recommend using the --cluster-id parameter when possible. This method populates your config with all HSM IP addresses in your cluster without needing to keep track of each individual address. Doing this adds extra resilience to your application initialization in the event an HSM is undergoing maintenance or during an Availability Zone outage. For more details, see Bootstrap the Client SDK.

Authenticate to perform operations

In AWS CloudHSM, you must authenticate to your cluster before you are able to perform most operations such as cryptographic operations.

Authenticate with CloudHSM CLI: You can authenticate with CloudHSM CLI using either its single command mode or interactive mode. Use the login command to authenticate in interactive mode. To authenticate in single command mode, you must set the environmental variables CLOUDHSM_ROLE and CLOUDHSM_PIN. For details on doing this, refer to Single Command mode. AWS CloudHSM recommends securely storing your HSM credentials when not being used by your application.

Authenticate with PKCS #11: In PKCS #11, you login using the C_Login API after opening a session using C_OpenSession. You only need to perform one C_Login per slot (cluster). After you have successfully logged in, you can open additional sessions using C_OpenSession without the need to perform additional login operations. For examples on authenticating to PKCS #11, see Code samples for the PKCS #11 library.

Authenticate with JCE: The AWS CloudHSM JCE Provider supports both implicit and explicit login. The method that works for you depends on your use case. When possible, we recommend using Implicit Login because the SDK will automatically handle authentication if your application becomes disconnected from your cluster and needs to be re-authenticated. Using implicit login also allows you to provide credentials to your application when using an integration that doesn’t allow you to have control over your application code. For more about login methods, see Provide credentials to the JCE provider.

Authenticate with OpenSSL: With the OpenSSL Dynamic Engine, you provide credentials through environment variables. AWS CloudHSM recommends securely storing your HSM credentials when not being used by your application. If possible, you should configure your environment to systematically retrieve and set these environment variables without manual entry. For details on authenticating with OpenSSL, see Installing the OpenSSL Dynamic Engine.

Effectively manage keys in your application

Use key attributes to control what keys can do: When generating a key, use key attributes to define a set of permissions that will allow or deny specific types of operations for that key. We recommend that keys be generated with the least amount of attributes needed to complete their task. For example, an AES key used for encryption should not also be allowed to wrap keys out of the HSM. For more information, see our attributes pages for the following Client SDKs:

When possible, cache key objects to minimize latency: Key find operations will query every HSM in your cluster. This operation is expensive and does not scale with HSM count in your cluster.

  • With PKCS #11, you find keys using the C_FindObjects API.

  • With JCE, you find keys using the KeyStore.

For optimal performance, AWS recommends that you utilize key find commands (like findKey and key list) only once during your application start-up and cache the key object returned in application memory. If you require this key object later on, you should retrieve the object from your cache instead of querying for this object for each operation which will add significant performance overhead.

Use multi-threading

AWS CloudHSM supports multi-threaded applications, but there are certain things to keep in mind with multi-threaded applications.

With PKCS #11, you should initialize the PKCS #11 library (calling C_Initialize) only once. Each thread should be assigned its own session (C_OpenSession). Using the same session in multiple threads is not recommended.

With JCE, the AWS CloudHSM provider should be initialized only once. Do not share instances of SPI objects across threads. For example, Cipher, Signature, Digest, Mac, KeyFactory or KeyGenerator objects should only be utilized in the context of their own thread.

Handle throttling errors

You may experience HSM throttling errors under the following circumstances:

  • Your cluster is not properly scaled to manage peak traffic.

  • Your cluster is not sized with a +1 redundancy during maintenance events.

  • Availability Zone outages result in a reduced number of available HSMs in your cluster.

See HSM Throttling for information on how to best handle this scenario.

To ensure your cluster is adequately sized and will not be throttled, AWS recommends you load test in your environment with your expected peak traffic.

Integrate retries on cluster operations

AWS may replace your HSM for operational or maintenance reasons. In order to make your application resilient to such situations, AWS recommends that you implement client-side retry logic on all operations that are routed to your cluster. Subsequent retries on failed operations due to replacements are expected to succeed.

Implement disaster recovery strategies

In response to an event, it may be necessary to shift your traffic away from an entire cluster or region. The following sections describe multiple strategies for doing this.

Use VPC peering to access your cluster from another account or region: You can utilize VPC peering to access your AWS CloudHSM cluster from another account or region. For information on how to set this up, see What is VPC peering? in the VPC Peering Guide. Once you have established your peering connections and configured your security groups appropriately, you can communicate with HSM IP addresses in the same way as you normally would.

Connect to multiple clusters from the same application: The JCE provider, PKCS #11 library, and CLI in Client SDK 5 support connecting to multiple clusters from the same application. For example, you can have two active clusters, each in different regions, and your application can connect to both at once and load balance between the two as part of normal operations. If your application is not using Client SDK 5 (the latest SDK), then you cannot connect to multiple clusters from the same application. Alternatively, you can keep another cluster up and running and, in the event there is a regional outage, shift your traffic to the other cluster to minimize downtime. See the respective pages for details:

Restore a cluster from a backup: You can create a new Cluster from a backup of an existing Cluster. For more information, see Managing AWS CloudHSM backups.

Monitoring

This section describes multiple mechanisms you can use to monitor your cluster and application. For additional details on monitoring, see Monitoring AWS CloudHSM.

Monitor client logs

Every Client SDK writes logs that you can monitor. For information on client logging, see Working with client SDK logs.

On platforms that are designed to be ephemeral, such as Amazon ECS and AWS Lambda, collecting client logs from a file can be difficult. In these situations, it is a best practice to configure your Client SDK logging to write logs to the console. Most services will automatically collect this output and publish it to Amazon CloudWatch logs for you to keep and view.

If you are using any third-party integration on top of the AWS CloudHSM Client SDK, you should ensure that you configure that software package to log its output to the console as well. The output from the AWS CloudHSM Client SDK may be captured by this package and written to its own log file otherwise.

See the Client SDK 5 configure tool for information on how to configure logging options in your application.

Monitor audit logs

AWS CloudHSM publishes audit logs to your Amazon CloudWatch account. Audit logs come from the HSM and track certain operations for auditing purposes.

You can use audit logs to keep track of any management commands that are invoked on your HSM. For example, you can trigger an alarm when you notice an unexpected management operation being performed.

See How HSM audit logging works for more details.

Monitor AWS CloudTrail

AWS CloudHSM is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in AWS CloudHSM. AWS CloudTrail captures all API calls for AWS CloudHSM as events. The calls captured include calls from the AWS CloudHSM console and code calls to the AWS CloudHSM API operations.

You can use AWS CloudTrail to audit any API call that is made to the AWS CloudHSM control plane to ensure that no unwanted activity is taking place in your account.

See Working with AWS CloudTrail and AWS CloudHSM for details.

Monitor Amazon CloudWatch metrics

You can use Amazon CloudWatch metrics to monitor your AWS CloudHSM cluster in real time. The metrics can be grouped by region, cluster ID, or HSM ID and cluster ID.

Using Amazon CloudWatch metrics, you can configure Amazon CloudWatch alarms to alert you of any potential issue that may arise that could impact your service. We recommend configuring alarms to monitor the following:

  • Approaching your key limit on an HSM

  • Approaching the HSM session count limit on an HSM

  • Approaching the HSM user count limit on an HSM

  • Differences in HSM user or key count to identify synchronization issues

  • Unhealthy HSMs to scale your cluster up until AWS CloudHSM can resolve the issue

For more details, see Working with Amazon CloudWatch Logs and AWS CloudHSM Audit Logs.