Amazon FSx - AWS Prescriptive Guidance

Amazon FSx

Amazon FSx for Windows File Server is a fully managed file storage service that's optimized for Windows workloads. It provides you with a simple and scalable solution to run your Windows-based applications and workloads, without the need for complex storage infrastructure management. You can use FSx for Windows File Server to easily provision and access shared file storage that supports your Windows applications natively, including Microsoft SQL Server, Microsoft SharePoint, and custom .NET applications. Additionally, FSx for Windows File Server helps you manage costs by providing flexible pricing options, such as pay-as-you-go and storage quotas, and automatic data deduplication to reduce storage footprint and optimize performance and cost.

This section covers the following topics:

  • Choose the right SMB file storage

  • Enable data deduplication in Amazon FSx

  • Understand data sharding in Amazon FSx

  • Understand HDD volume usage in Amazon FSx

  • Understand the single Availability Zone implementation in Amazon FSx

Choose the right SMB file storage

Overview

AWS offers a variety of fully managed storage services that give you the rich capabilities of industry-leading file services, while combining the latest AWS infrastructure innovations and security. You can incorporate AWS services into infrastructure as code (IaC) workflows and integrate them with AWS compute, monitoring, and data protection services. For Windows workloads, you can choose from two fully managed file services that can be used to match your application needs: FSx for Windows File Server and Amazon FSx for NetApp ONTAP.

FSx for Windows File Server

Amazon FSx for Windows File Server provides fully managed shared storage built on Windows Server, and delivers a wide range of data access, data management, and administrative capabilities. FSx for Windows File Server integrates easily with Windows environments because it's a Windows-native service. We recommend using FSx for Windows File Server for user and group shares, Always On Failover Cluster Instances for SQL Server, Windows applications, and virtual desktop infrastructure (VDI). FSx for Windows File Server also integrates well with Amazon FSx File Gateway, Amazon Kendra, audit logs for Amazon S3, and Amazon Firehose.

FSx for ONTAP

FSx for ONTAP is based on NetApp’s proprietary ONTAP file system. It takes some level of upskilling and is recommended mostly to existing on-premises NetApp users. Typical use cases include user and group shares, Always On Failover Cluster Instances for SQL Server, Windows applications, and VMware Cloud on AWS. FSx for ONTAP supports multiple protocols, larger than 64 TB file systems (PB scale without a DFS namespace server), cloning, replication, snapshots, compression (storage efficiency), and intelligent tiering of data.

Cost impact

FSx for Windows File Server

FSx for Windows File Server was the first shared storage solution on AWS for deploying Failover Cluster Instances for SQL Server. With FSx for Windows File Server, you could launch Failover Cluster Instances using SQL Standard edition licensing. This, however, prevents you from relying on Always On availability groups, which require SQL Server Enterprise edition licenses. By switching from SQL Server Enterprise Standard edition to SQL Server Standard edition, you could save 65–75 percent on your SQL Server licensing.

You can use FSx for Windows File Server for Failover Cluster Instances to offload storage I/O from typical EBS storage. By offloading I/O to FSx for Windows File Server, you could scale down EC2 instances, which rely on high Amazon EBS throughput and IOPS, without affecting storage throughput.

FSx for ONTAP

You can use FSx for ONTAP to run your Microsoft failover cluster on block protocol iSCSI and benefit from SQL Server instant file initialization, cross-Region replication using SnapMirror, antivirus support, and cloning. If you create multiple copies of databases for testing, cloning can make a significant difference in both space consumption and how quick those database copies can be created. Additionally, you can use NetApp SnapCenter to manage backup, restore, and clone functionality with your EC2 instances for SQL Server by using FSx for ONTAP. FSx for ONTAP also provides automatic tiering from SSD to a low cost capacity pool storage for a mixture of performance and cost efficiency.

FSx for ONTAP supports NetApp's file system (ONTAP), unlike FSx for Windows File Server which supports a Windows native NTFS file system. The minimum size for FSx for ONTAP is 1024 GB, while FSx for Windows File Server can start as low as 32 GB.

Integration with Microsoft Distributed File System

FSx for Windows File Server and FSx for ONTAP integrate with Microsoft's Distributed File System (DFS) for seamless integration into existing deployments. Keep the following in mind when planning your architecture:

  • FSx for Windows File Server and FSx for ONTAP support DFS Namespaces (DFSN) on both deployment types (multiple Availability Zones and single Availability Zones).

  • Only FSx for Windows File Server supports DFS Replication (DFSR), and only when using single Availability Zones.

Cost optimization recommendations

Performance for both FSx for Windows File Server and FSx for ONTAP is very configuration dependent, as is their pricing. FSx for Windows File Server pricing primarily depends on storage capacity and storage type, throughput capacity, backup, and data transferred. With FSx for ONTAP, you pay for SSD storage, SSD IOPS, capacity pool usage, throughput capacity, and backup.

File service

Cost for 5 TB storage

Configuration

Region

FSx for Windows File Server

$982.78

Single Availability Zone

SSD (15,000 IOPS)

32 MBps

5 TB backup (no deduplication savings)

US East (N. Virginia)

FSx for ONTAP

$979.28

Single Availability Zone

100% SSD

15,000 read-write capacity tier

15,000 SSD IOPS

128 MBps

5 TB backup (no deduplication savings)

US East (N. Virginia)

Keep in mind the following:

  • Deduplication and compression enable you to store more data on physical devices by shrinking data size, but you pay for the provisioned solid state drive (SSD) or hard disk drive (HDD) storage.

  • You can use FSx for ONTAP to tier your data. It's extremely rare for 100 percent of your data to be accessed regularly and require SSD storage. You can move cold and infrequently accessed data to a capacity tier for cost savings.

  • The prices mentioned here are calculated with 100 percent data on the SSD tier and 15,000 IOPS on the SSD tier.

Backup

By default, both FSx for ONTAP and FSx for Windows File Server store their fully managed backup on Amazon S3. However, with FSx for ONTAP there is an additional option for backup using SnapVault, which can configure backups to reside in the capacity tier. Backing up with SnapVault is a self-managed mechanism that's more cost-efficient than the default fully-managed backup option. The fully-managed backup option is $0.05 per GB-month. The SnapVault backup on FSx for ONTAP (10:1 SSD to capacity pool storage) is $0.03221 (0.9x0.0219+0.1x0.125).

Keep in mind the following:

  • AWS managed backups offer granularity of one hour. SnapVault enables you to go as low as five minutes.

  • You can use NetApp’s tools (such as the CLI and API) to configure the SnapVault relationships and snapshot replication.

  • Enable the all tiering policy on a SnapVault volume to use the capacity tier as storage for the backup data.

  • SnapVault destinations can be in the same AWS Region, cross-Region, or on-premises. This is usually to a single Availability Zone or multiple Availability Zone file system backup destination. In comparison, AWS Backup is backed by the regional resiliency of Amazon S3.

Right sizing

You can also save on costs and get the most out of your file system by right sizing and preventing over provisioning.

To right size, do the following:

  1. Identify your current needs based on data. For typical Windows workloads, you can use built-in operating system tools like Performance Monitor.

  2. In Performance Monitor, use the following counters to gauge your current performance needs. The capture interval is set to one second, with a maximum log size of 1,000 MB and overwrite enabled.

    Logman.exe create counter PerfLog-Short -o “c:\perflogs\PerfLog-Long.blg” -f bincirc -v mmddhhmm -max 1024 -c “\LogicalDisk(*)\*” “\Memory\*” “\.NET CLR Memory(*)\*” “\Cache\*” “\Network Interface(*)\*” “\Paging File(*)\*” “\PhysicalDisk(*)\*” “\Processor(*)\*” “\Processor Information(*)\*” “\Process(*)\*” “\Thread(*)\*” “\Redirector\*” “\Server\*” “\System\*” “\Server Work Queues(*)\*” “\Terminal Services\*” -si 00:00:01
  3. To start the log capture, run the logman start PerfLog-Short command. To stop the log capture, run the logman stop PerfLog-Short command. Note: You can find performance log files in c:\perflogs on the server running the capture. For more information, see Windows Performance Monitor Overview in the Microsoft documentation.

  4. After you identify the correct the configuration, test if your estimate is correct on the Amazon FSx file system by using disk stress tools like Microsoft DISKSPD.

  5. If you're satisfied with the performance, cut over to the file share.

We recommend a conservative approach to storage capacity as it can only be scaled up. Throughput capacity can be scaled up and down as needed.

Additional resources

Enable data deduplication in Amazon FSx

Overview

Data deduplication is a feature that enables you store your data more efficiently and with less capacity requirements. It involves finding and removing duplication within data without compromising its fidelity or integrity. Data deduplication uses subfile variable-size chunking and compression, which deliver optimization ratios of 2:1 for general file servers and up to 20:1 for virtualization data. Data deduplication is much more effective than NTFS compression. Inherent in the deduplication architecture is resiliency during hardware failures—with full checksum validation on data and metadata, including redundancy for metadata and the most accessed data chunks.

FSx for Windows File Server fully supports data deduplication. Using it can lead to an average savings of 50–60% for general-purpose file shares. Within shares, savings range from 30–50% for user documents and up to 70–80% for software development datasets. It's important to understand that the storage savings that you can achieve with data deduplication depend on the nature of your dataset, including how much duplication exists across files. Deduplication is not a good option if the data stored is dynamic in nature.

Cost impact

To cope with data storage growth in the enterprise, administrators consolidate servers and make capacity scaling and data optimization key goals. Data deduplication's default settings can provide savings immediately, or administrators can fine-tune the settings to see additional gains. For example, you can configure deduplication to run only on certain file types, or you can create a custom job schedule.

At a high level, deduplication has three types of jobs: optimization, garbage collection, and scrubbing. Be aware that space won't be freed until you run a garbage collection job after optimization. You can schedule the job or you can manually run it. All settings available when you schedule a data deduplication job are also available when you start a job manually (except for those which are scheduling-specific).

Even with only a 25-percent effective savings from deduplication, there’s a significant cost savings for FSx for Windows File Server. These projected savings are based on an estimate in the AWS Pricing Calculator.

Cost optimization recommendations

Deduplication on FSx for Windows File Server file systems is not enabled by default. To enable deduplication by using remote management on PowerShell, you must run the Enable-FSxDedup command and then use the Set-FSxDedupConfiguration command to set the configuration. For more information, see Administering file systems in the Amazon FSx Windows User Guide.

To enable deduplication, run the following command:

'PS C:\Users\Admin ``Invoke-Command -ComputerName amznfsxzzzzzzzz.corp.example.com -ConfigurationName FSxRemoteAdmin -ScriptBlock **`{`**Enable-FsxDedup }'

To verify your deduplication configuration, run the following command:

Invoke-Command -ComputerName [amznfsxzzzzzzzz.corp.example.com](http://amznfsxzzzzzzzz.corp.example.com/) -ConfigurationName FSxRemoteAdmin -ScriptBlock { Set-FSxDedupSchedule -Name "CustomOptimization" -Type Optimization -Days Mon,Tues,Wed,Sat -Start 09:00 -DurationHours 9 }

By running the PowerShell Measure-DedupFileMetadata cmdlet, you can determine how much potential disk space can be reclaimed on a volume if you delete a group of folders, a single folder, or a single file, and then run a garbage collection job. Specifically, the DedupDistinctSize value tells you how much space you get back if you delete those files. Files often have chunks that are shared across other folders, so the deduplication engine calculates which chunks are unique and would be deleted after the garbage collection job.

The default data deduplication job schedules are designed to work well for recommended workloads and be as non-intrusive as possible (excluding the priority optimization job that's enabled for the backup usage type). If workloads have large resource requirements, we recommend that you schedule jobs run only during idle hours, or to reduce or increase the amount of system resources that a data deduplication job is allowed to consume.

By default, data deduplication uses 25 percent of the memory available. However, this can be increased by using -memory switch. For optimization jobs, we recommend that you set a range from 15 to 50. For scheduled jobs, you can use higher memory consumption. For example, with garbage collection and scrubbing jobs (which you typically schedule to run in off hours), you can set higher memory consumption (such as 50).

For additional information regarding data deduplication settings, see the AWS FSx User Guide.

Additional resources

Understand data sharding in Amazon FSx for Windows File Server

Overview

FSx for Windows File Server performance is configuration dependent. It's primarily based on storage type, storage capacity, and throughput configuration. The throughput capacity that you select determines the performance resources available for the file server—including the network I/O limits, the CPU and memory, and the disk I/O limits imposed by the file server. The storage capacity and storage type that you select determine the performance resources available for the storage volumes—the disk I/O limits imposed by the storage disks. In addition to performance, the configuration choices also influence the cost. FSx for Windows File Server pricing primarily depends on storage capacity and storage type, throughput capacity, backup, and data transferred.

If you have relatively large file storage and performance requirements, you can benefit from data sharding. Data sharding involves dividing your file data into smaller datasets (shards) and storing them across different file systems. Applications accessing your data from multiple instances can achieve high levels of performance by reading and writing to these shards in parallel. At the same time, you can still present a unified view under a common namespace to your applications. In addition, it can also help to scale file data storage beyond what each file system supports (64 TB) for large file datasets—up to hundreds of petabytes.

Cost impact

For large datasets, it's typically more effective to deploy multiple small FSx for Windows File Server file systems, rather than one large SSD share to achieve the same level of performance. Using a combination of FSx for Windows File Server HDD and SSD storage types enables better cost savings, and enables you to match the workload with the best underlying disk subsystem. In the following tables, you can see the difference between a single 17 TB file system and compare it to multiple smaller file systems which add to the same capacity.

Large SSD file system with multiple workloads

Server name

Cost

Configuration

Region

Amazon FSx for Windows File Server

$5,716 USD

17 TB SSD

30 percent deduplication

256 Mbps

17 TB backup

US East (N. Virginia)

Partitioned workload using DFSN

Server name

Cost

Configuration

Region

Share

Amazon FSx for Windows File Server

$1,024 USD

2 TB SSD

20% deduplication

128 Mbps

2 TB backup

Multi-AZ

US East (N. Virginia)

Share 1

Amazon FSx for Windows File Server

$2,132 USD

5 TB SSD

30% deduplication

256 Mbps

5 TB backup

Multi-AZ

US East (N. Virginia)

Share 2

Amazon FSx for Windows File Server

$1,036 USD

10 TB HDD

40% deduplication

128 Mbps

10 TB backup

Multi-AZ

US East (N. Virginia)

Share 3

DFSN Windows EC2 instances

$27 USD

t3a.medium

2 vCPUs

4 GiB memory

US East (N. Virginia)

DFSN Instances

The annual cost for a large SSD file system is $68,592. The annual cost of a partitioned workload is $50,640. In this example, a 26 percent savings can be achieved while matching the workload to the appropriate backend storage. For more information about pricing estimation, see the AWS Pricing Calculator estimate.

Cost optimization recommendations

To deploy a data deduplication solution, you must set up a Microsoft DFS Namespace based on the type of data, I/O size, and I/O access pattern. Each namespace supports up to 50,000 file shares and hundreds of petabytes of storage capacity in aggregate.

It works most efficiently to choose a sharding convention that distributes I/O evenly across all the file systems you plan on using. Monitoring your workload will help with additional optimization or cost reduction. If you need help gauging performance information for the Amazon FSx file system, see FSx for Windows File Server performance in the Amazon FSx Windows User Guide.

After you choose a sharding strategy, you can group the file systems for easy access to your shares by using DFS Namespaces. This enables users to see one homogenous file system, when in reality they’re accessing a variety of different file systems with purpose-built use cases. It’s important to create the shares with a proper naming convention so your end users can easily decipher what workload the shares are designed for. It's also important to label production and non-production shares, so end users don’t place files in the wrong file system by mistake.

The following diagram shows how a single DFS Namespace can be used as the access point for multiple FSx file systems.

DFN Namespace access point

Keep in mind the following:

  • You can add existing FSx for Windows File Server shares to a DFS tree.

  • Amazon FSx can't be added to the root of the DFS share path. You have only one subfolder.

  • You must deploy an EC2 instance to serve the DFS namespace configuration.

For more information about DFS-N configuration, see DFS Namespaces overview in the Microsoft documentation. For more information about using DFS namespaces, see the Using DFS Namespaces with Amazon FSx for Windows File Server video on YouTube.

Additional resources

Understand HDD volume usage in Amazon FSx

Overview

Amazon FSx for Windows File Server offers the flexibility to choose throughput independently of file system capacity. Two capacity settings are available: HDD (hard disk drive) and SSD (solid state drive). EBS st1 drives are used for the file system storage in HDD. EBS io1 drives are used for SSD.

The following diagram shows the relationship between throughput and storage settings.

Relationship between throughput and storage settings

With HDD-based storage, you receive a 12 IOPS baseline with 80 burst disk IOPS (IOPs per TiB of storage) and throughput of 12 Megabytes/second baseline with 80 burst Megabytes/second (per TiB of storage). For example, if your share is 50 TB in size, you get 50 * 12 = 600 as baseline for both throughput and IOPS.

Amazon FSx for Windows File Server provides 80 burst IOPS. Burst credits are refilled automatically when your utilization is below your baseline rate and are automatically consumed when your utilization is above your baseline rate. For example, if your workload is only utilizing 10 IOPS/TB for an hour (2 IOPS/TB below your baseline rate), you can then utilize 14 IOPS/TB (2 IOPS/TB above your baseline) for the following hour before running out of burst credits again.

For file operations, Amazon FSx for Windows File Server provides consistent sub-millisecond latencies with SSD storage and single-digit millisecond latencies with HDD storage. For all file systems, including those with HDD storage, Amazon FSx for Windows File Server provides a fast (in-memory) cache on the file server, so you can get high performance and sub-millisecond latencies for actively accessed data, irrespective of storage type.

When appropriate, the usage of HDD storage can help to reduce the cost of your overall storage capacity and provide a reliable storage platform for your needs.

Cost impact

Amazon FSx for Windows File Server performance depends on three factors: storage capacity, storage type, and throughput. Network I/O performance and in-memory cache size are solely determined by throughput capacity, while the disk I/O performance is determined by a combination of throughput capacity, storage type, and storage capacity.

While SSD is recommended for I/O intensive workloads, there are a variety of workloads whose needs can be met with HDD performance specs. HDD storage is designed for a broad spectrum of workloads, including home directories, user and departmental shares, and content management systems. For example, if your users only need low-latency access to data supporting current projects, then most of the data you're storing is infrequently accessed.

You can use the AWS Pricing Calculator to provide a comparison of a 20 TB SSD to an HDD file system in us-east-1. As the following table shows, even with no deduplication savings, the cost difference is significant when comparing HDD file systems to SSD file systems.

Amazon FSx file system configuration

Monthly costs

20 TB multi-AZ SSD (us-east-1)

$4,699.30

20 TB multi-AZ HDD (us-east-1)

$542.88

Estimated monthly savings

$4,156.42

Note

For additional FSx for Windows File Server savings, see the Enable data deduplication in Amazon FSx section of this guide.

By correctly identifying your performance needs, you can select right storage for your workload and reduce your costs.

Cost optimization recommendations

If you decide to use HDD storage, test your file system to ensure it can meet your performance requirements. HDD storage comes at a lower cost relative to SSD storage, but with lower levels of disk throughput and disk IOPS per unit of storage. It might be suitable for general-purpose user shares and home directories with low I/O requirements, large content management systems where data is retrieved infrequently, or datasets with small numbers of large files.

The storage type for an existing file system can't be changed. To convert the storage type for an Amazon FSx for Windows File Server file system, you must back up your existing file system and restore it to a new file system with the desired storage type. If you're looking to convert an existing SSD file system to an HDD file system, be aware that HDD has a much higher minimum capacity of 2 TB.

To restore a backup with a different storage type, do the following:

  1. Back up your existing file system.

  2. Create a new Amazon FSx file system with the HDD storage type.

  3. Restore the backup to the new file system with the desired storage type.

  4. Verify the new file system has the correct storage type and your data is intact.

Before moving your changes to production, we recommend that you analyze the performance of your Amazon FSx file system and verify the change is acceptable. For more guidance, see the Optimizing Amazon FSx for Windows File Server performance with new metrics post on the AWS Storage Blog.

Additional resources

Use a single Availability Zone

Overview

This section explains when it's more beneficial to use a single Availability Zone implementation of Amazon FSx for Windows File Server. It covers scenarios where moving to a single Availability Zone reduces costs, while still enabling you to use Amazon FSx for Windows File Server as your managed file storage service. We recommend that you implement a single Availability Zone for Amazon FSx for production workloads. This can help ensure that you have the redundancy of multiple Availability Zones.

Cost impact

A single Availability Zone file system offers an approximately 40 percent cost reduction compared to a multiple Availability Zone implementation. With a multiple Availability Zone file system, you pay $0.230 per GB-month in SSD and $0.025 per GB-month in HDD compared to $0.130 per GB-month for SSD and $0.013 per GB-month for HDD on a single Availability Zone file system. You can see a comparison of costs and create your own estimates by using the AWS Pricing Calculator.

For a 10 TB file system this can be a difference of paying approximately $1,200 per month for multiple Availability Zones or $680 per month for a single Availability Zone. This example uses a 10 TB FSx Windows file system with SSD. The estimated savings for deduplication is 50 percent. Overall, a single Availability Zone has a lower cost of entry but comes with a few caveats that are covered in the next section.

Cost optimization recommendations

Single Availability Zone deployments

To make sure that a single Availability Zone is the right fit, take into consideration your own internal SLAs for the data being stored on FSx for Windows File Server. This entails understanding if you have SLAs to provide to your customers (internal and external) and if the three nines of availability for an Amazon FSx single Availability Zone will still allow you to meet those SLAs. FSx for Windows File Server with a single Availability Zone still has an uptime of 99.9 percent. The SLA for Amazon FSx for multiple Availability Zones is greater than 99.99 percent. For mission-critical workloads, we recommend that you use multiple Availability Zones over a single Availability Zone, even at additional cost.

Single Availability Zone deployments are ideal for workloads such as backups for SQL Server databases. They can provide low cost storage with an HDD tier, while still providing you with consistent uptime. If you require a higher level of availability for a production workload, such as highly-available SQL servers or production application access, then a single Availability Zone isn't the right fit for your workloads. For backups, non-production testing, and development environments, an Amazon FSx single Availability Zone implementation can reduce your operational costs.

One use case where an Amazon FSx single Availability Zone file system works well is in a production situation where multiple Amazon FSx single Availability Zone file systems are in use, as the per-server storage in a highly available SQL Server cluster using Always On availability groups. For more information, see the Optimizing cost for your high availability SQL Server deployments on AWS post on the AWS Storage Blog.

Multi-Region replication

A potential option for reducing costs with a single Availability Zone file system (one where only a single Availability Zone file system works) is if you want to take advantage of a multi-Region replication with Amazon FSx. You can deploy Single-AZ file systems that support usage with native Microsoft DFS-R. DFS-R has the capability to automatically replicate data across Regions and multiple sites. For more information about configuring DFS-R using Amazon FSx, see Using Microsoft Distributed File System Replication in the Amazon FSx Windows User Guide.

Another alternative for multi-Region cost savings is using AWS Storage Gateway. This enables you to implement an Amazon FSx File Gateway in another Region for multi-Region access of Amazon FSx. For more information, see the AWS Storage Gateway section of this guide.

If you work across Regions, you must consider the data transfer cost for cross-Region data traffic. Traffic moving across Regions incurs a $0.02/Gb charge. So, if you have consistent data change at high volumes, this will add to your overall cost. For example, 1 TB of data transfer equals approximately $20.48.

Maintenance window

The maintenance window is a key consideration if you're using a Single Availability Zone with Amazon FSx. During the maintenance window, the Amazon FSx file system is unavailable for roughly 20 minutes, due to routine software patching for the underlying Windows Server. If you're using the file system for overnight backups, adjust the Amazon FSx maintenance window accordingly to avoid interruptions during your backup. You can adjust the maintenance window after creating your Amazon FSx file system.

Additional resources