Best practices for FSx for ONTAP deployments in enterprise environments - AWS Prescriptive Guidance

Best practices for FSx for ONTAP deployments in enterprise environments

This section provides some best practices and considerations for deploying and operating Amazon FSx for NetApp ONTAP in enterprise environments. These recommendations are based on the experiences of AWS Professional Services.

In addition to the recommendations in this guide, adhere to the following best practices:

Best practices for storage tiers and tiering policies

Storage tiers are the physical storage media for an Amazon FSx for NetApp ONTAP file system. The following storage tiers are available:

  • The SSD tier is high-performance solid-state drive (SSD) storage designed for active data, and you choose the storage size for this tier.

  • The capacity pool tier is fully elastic storage that is cost-optimized for infrequently accessed data. The SSD tier is significantly faster than the capacity pool tier. FSx for ONTAP SSD storage provides sub-millisecond file operational latencies, and the capacity pool tier provides tens of milliseconds of latency.

For more information about these tiers, see FSx for ONTAP storage tiers.

A tiering policy, which you configure at the volume level, determines if and when data that's stored in the SSD tier transitions to the capacity pool tier. FSx for ONTAP offers four different tiering policies: Snapshot Only, Auto, All, and None. For more information about each policy, see Tiering policies in the FSx for ONTAP documentation.

Consider the following recommendations when setting tiering policies for the volumes in your file share:

  • HPC workloads should access data in the SSD tier to prevent performance bottlenecks. For volumes accessed by HPC workloads, we recommend setting the tiering policy to None or Snapshot Only.

  • When migrating data to the file share, we recommend setting the target volume tiering policy to All. This reduces costs because all data migrates to the SSD tier and is then immediately moved to the capacity pool tier. In addition, if 98% or more of the SSD tier capacity is utilized, then writing to the tier is stopped. Setting the tiering policy to All prevents reaching this tiering threshold during the migration. After the migration is complete, you can change the tiering policy in order to balance performance and costs. For more information, see Migrating file shares to Amazon FSx for NetApp ONTAP using AWS DataSync (AWS blog post).

Best practices for using the NetApp ONTAP maximum directory size

maxdirsize (NetApp documentation) is a NetApp ONTAP setting that determines the maximum number of files that can be stored in each directory. This setting applies to the volume, so all of the directories in a volume have the same maxdirsize setting. The default value is 320 MB, which allows you to store up to 4.3 million files in each directory.

You can increase the maxdirsize value to support larger directories. After the value has been increased, it cannot be decreased without recreating the directory. Because directories are loaded in memory, there is a tradeoff between the size of the directories and the performance of your file system. You can validate custom settings only through a test. NetApp recommends that you keep this value at its default. For more information, see Best practices and implementation guide for NetApp ONTAP FlexGroup volumes (NetApp documentation).

If you customize the maxdirsize setting, you can use the following formula to determine how many files can fit into a single folder.

max number of files in each directory = maxdirsize in MB × 53 × 0,25

Best practices for monitoring FSx for ONTAP file systems

Similar to other AWS services, FSx for ONTAP is integrated with Amazon CloudWatch. CloudWatch helps you monitor the metrics of your AWS resources in near real time. Metrics are available at the file system and volume levels, and detailed monitoring metrics for these resources help you analyze them with more granular reporting detail. For more information, see Monitoring with Amazon CloudWatch in the FSx for ONTAP documentation. Consider the following recommendations when monitoring FSx for ONTAP by using CloudWatch:

  • We recommend that you use the StorageUsed file system metric so that you can filter monitoring results by storage tier.

  • Use the StorageCapacity file system metric to configure a CloudWatch alarm that notifies you if more than 80% of the SSD tier capacity is utilized. This ensures that tiering functions properly for the volume, and it helps you maintain capacity for new data. For more information, see Tiering thresholds.

Best practices for choosing an Availability Zone deployment option

You can deploy Amazon FSx for NetApp ONTAP in a Single-AZ or Multi-AZ configuration. Each option provides different levels of availability and durability. For more information about these deployment options, see Availability and durability in the FSx for ONTAP documentation.

Multi-AZ deploys the FSx for ONTAP file system in an active-passive configuration. Therefore, all servers that connect to the file share use only the endpoint in the primary Availability Zone. The endpoint in the secondary Availability Zone is for failover only, and it is not used to read or write unless the primary Availability Zone fails.

You cannot change the Availability Zone deployment option after you create the FSx for ONTAP file system. To change the Availability Zone configuration, you have to create a new file system and then migrate the data to the new file system.

However, even if you deployed a file share using the Single-AZ option, you can still access it from other Availability Zones. Your networking configuration, such as security groups and network access control list (network ACL) must allow the clients to connect to the file system endpoint. Using this approach, there is a charge for cross-AZ traffic in each direction (read and write). For more information, see Amazon FSx for NetApp ONTAP Pricing.

When choosing a deployment option, you must choose between the resiliency of the Multi-AZ configuration and the performance of the Single-AZ configuration. If practical for your use case, we recommend selecting Multi-AZ option because it provides high availability. However, the Single-AZ option can be more cost-effective and reduce latency. Consider the HPC workload and whether it can tolerate the additional latency.