Amazon FSx for Lustre
Lustre User Guide

Using Data Repositories

Amazon FSx for Lustre is designed for short-term, compute-intensive workloads where your long-term data is stored in a durable data repository, such as Amazon S3 or an on-premises data store. With Amazon FSx for Lustre, your file system is generally brought up for the duration of your compute job (typically several hours or days). Amazon FSx for Lustre is cost-optimized for these workloads by providing high-performance, nonreplicated file storage that’s designed to be spun up and down as needed. An Amazon FSx for Lustre repository isn't meant to store durable, long-term data.

When you use Amazon FSx for Lustre with a durable storage repository, you can ingest and process large volumes of file data in a high-performance file system. At the same time, you can periodically write intermediate results to your data repository. By taking this approach, you can restart your workload at any time from the latest data you've stored in your data repository. When your workload is done, you can write final results from your file system to your data repository, and delete your file system.

Amazon FSx for Lustre is deeply integrated with Amazon S3. This integration means that you can seamlessly access the objects stored in your Amazon S3 buckets from applications mounting your Amazon FSx for Lustre file system. Amazon FSx for Lustre also supports cloud bursting workloads with on-premises data repositories by enabling you to copy data from on-premises clients using AWS Direct Connect or VPN. When you use Amazon FSx for Lustre with either type of data repository, you can have your data copied into your Amazon FSx for Lustre file system as needed. You can also run your compute-intensive workloads on Amazon EC2 instances in the AWS Cloud and copy your results to your data repository when your workload is done.

Important

In many cases, you link one or more Amazon FSx for Lustre file systems to an Amazon S3 bucket. If so, don't delete the Amazon S3 bucket until all linked file systems have been deleted.

Importing Data from Your Amazon S3 Bucket

During file system creation, you have the option of Amazon FSx for Lustre automatically preloading a listing of file metadata (file name, size, and modification time) into your file system. Doing this allows clients to view the listing of files in your data repository as soon as your file system is available.

Note

Importing data from your Amazon S3 data repository happens during file system creation. If you have a large number of files to import, this affects how long it takes for your file system to be created.

Amazon FSx for Lustre automatically copies file data for a given time the first time you open that file from the Amazon S3 data repository into your file system. This data movement is managed by Amazon FSx for Lustre and occurs transparently to your applications. Subsequent reads of these files are served directly out of the file system with consistent submillisecond latencies.

Amazon FSx for Lustre copies data from your Amazon S3 data repository when a file is first accessed. Because of this approach, the initial read or write to a file incurs a small latency penalty. If your application is sensitive to this latency and you know which file your application needs to access, you can optionally preload contents of an individual file. You do so using the hsm_restore command, as follows.

You can use the hsm_action command to verify that the file's contents have finished loading into the file system. A return value of NOOP indicates that the file has successfully been loaded. Run the following commands from a compute instance with the file system mounted.

sudo lfs hsm_restore path/to/file sudo lfs hsm_action path/to/file

You can preload the entirety of your file system (or the entirety of a directory within your file system) using the following commands. If you request the preloading of multiple files simultaneously, Amazon FSx for Lustre loads your files from your Amazon S3 data repository in parallel.

nohup find local/directory -type f -print0 | xargs -0 -n 1 sudo lfs hsm_restore &

Exporting Data to Your Amazon S3 Bucket

In Amazon FSx for Lustre, you can export files that you have written or modified in your file system to your Amazon S3 data repository at any time. When you export a particular file or directory, your file system exports only files that have been created or modified since the last export, or since file system creation.

Setting the Export Path in the Console

You can specify an export path when you create your file system using the Amazon FSx console.

To specify an export path using the console

  1. Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.

  2. From the dashboard, choose Create file system to start the file system creation wizard.

  3. Choose Amazon FSx for Lustre for the file system type, and then choose Next.

  4. Provide information required for the File system details and Network and security sections. For more information, see Step 1: Create Your Amazon FSx for Lustre File System.

  5. Expand the Data repository integration section and choose Amazon S3 for Data repository type. Specify the Amazon S3 Import bucket and an optional Import prefix as the data repository source.

  6. Choose one of the three Export prefix options:

    • A unique prefix that Amazon FSx creates in your bucket – Choose this option to export new and changed objects using a prefix generated by Amazon FSx for Lustre. The prefix looks like the following: /FSxLustrefile-system-creation-timestamp. The timestamp is in UTC format, for example FSxLustre20181105T222312Z. This option is the default.

    • The same prefix that you imported from (replace existing objects with updated ones) – Choose this option to replace existing objects with updated ones.

    • A prefix you specify – Choose this option to preserve your imported data and to export new and changed objects using a prefix that you specify.

    
              The Data repository integration window
  7. (Optional) Set Maintenance preferences or use the system defaults.

  8. Choose Next and review the file system settings.

  9. Choose Create file system.

Setting the Export Path Using the AWS CLI

You can specify an export path when you create your file system using the AWS CLI or the Amazon FSx API.

The following example use the adminuser as the profile parameter value. To provide your own credentials, use an appropriate user profile. For more information about the AWS CLI, see Getting Set Up with the AWS Command Line Interface in the AWS Command Line Interface User Guide.

To create an Amazon FSx for Lustre file system using the Amazon FSx CLI

  • To create an Amazon FSx for Lustre file system, use the Amazon FSx CLI command create-file-system, as shown following. The corresponding API operation is CreateFileSystem.

    $ aws fsx create-file-system \ --client-request-token CRT1234 \ --file-system-type LUSTRE \ --lustre-configuration ImportPath=s3://lustre-export-test-bucket/ ExportPath=s3://lustre-export-test-bucket/export \ --storage-capacity 3600 \ --subnet-ids subnet-123456 \ --tags Key="Name",Value=Lustre-TEST-1 \ --region us-east-2

After successfully creating the file system, Amazon FSx returns the file system description as JSON, as shown in the following example.

{ "FileSystems": [ { "OwnerId": "owner-id-string", "CreationTime": 1549310341.483, "FileSystemId": "fs-0d32332e8912345", "FileSystemType": "LUSTRE", "Lifecycle": "CREATING", "StorageCapacity": 3600, "VpcId": "vpc-123456", "SubnetIds": [ "subnet-123456" ], "NetworkInterfaceIds": [ "eni-039fcf55123456789" ], "DNSName": "fs-039fcf55123456789.fsx.us-east-2.amazonaws.com", "ResourceARN": "arn:aws:fsx:us-east-2:123456:file-system/fs-0d32332e8912345", "Tags": [ { "Key": "Name", "Value": "Lustre-TEST-1" } ], "LustreConfiguration": { "WeeklyMaintenanceStartTime": "2:04:30", "DataRepositoryConfiguration": { "ImportPath": "s3://lustre-export-test-bucket/", "ExportPath": "s3://lustre-export-test-bucket/export", "ImportedFileChunkSize": 1024 } } } ] }

Determining a File System's Export Path

To determine the export path for your file system, call the aws fsx describe-file-systems command.

To export an individual file to your data repository and verify that the file has successfully been exported to your data repository, run the commands shown following. A return value of NOOP indicates that the file has successfully been exported.

sudo lfs hsm_archive path/to/export/file sudo lfs hsm_action path/to/export/file

To export your entire file system (or an entire directory in your file system), run the following commands. If you export multiple files simultaneously, Amazon FSx for Lustre exports your files to your Amazon S3 data repository in parallel.

nohup find local/directory -type f -print0 | xargs -0 -n 1 sudo lfs hsm_archive &

To determine whether the export has completed, run the following command.

find path/to/export/file -type f -print0 | xargs -0 -n 1 -P 8 sudo lfs hsm_action | grep "ARCHIVE" | wc -l

If the command returns with zero files remaining, then the export is complete.

Using Amazon FSx for Lustre with Your On-Premises Data Repository

You can use Amazon FSx for Lustre to process data stored in your on-premises data repository with in-cloud compute instances. Amazon FSx for Lustre supports access over AWS Direct Connect and VPN, enabling you to mount your file systems from on-premises clients.

To use Amazon FSx for Lustre with your on-premises data

  1. Create a file system. For more information, see Step 1: Create Your Amazon FSx for Lustre File System in the getting started exercise.

  2. Mount the file system from on-premises clients. For more information, see Mounting from On-Premises or a Peered Amazon VPC.

  3. Copy the data that you want to process into your Amazon FSx for Lustre file system.

  4. Run your compute-intensive workload on in-cloud Amazon EC2 instances mounting your file system. If you want to, you can periodically copy intermediate results to your data repository.

  5. When you're finished, copy the final results from your file system back to your on-premises data repository, and delete your Amazon FSx for Lustre file system.