Using network file systems with AWS PCS
You can attach network file systems to nodes launched in an AWS Parallel Computing Service (AWS PCS) compute node group to provide a persistent location where data and files can be written and accessed. You can use file systems provided by AWS services, including Amazon Elastic File System (Amazon EFS), Amazon FSx for OpenZFS, Amazon FSx for Lustre, and Amazon File Cache. You can also use self-managed file systems, such as NFS servers.
This topic covers considerations for and examples of using network file systems with AWS PCS.
Considerations for using network file systems
The implementation details for various file systems are different, but there are some common considerations.
-
The relevant file system software must be installed on the instance. For example, to use Amazon FSx for Lustre, the appropriate Lustre package should be present. This can be accomplished by including it in the compute node group AMI or using a script that runs at instance boot.
-
There must be a network route between the shared network file system and the compute node group instances.
-
The security group rules for both the shared network file system and the compute node group instances must allow connections to the relevant ports.
-
You must maintain a consistent POSIX user and group namespace across resources that access the file systems. Otherwise, jobs and interactive processes that run on your PCS cluster may encounter permissions errors.
-
File system mounts are done using EC2 launch templates. Errors or timeouts in mounting a network file system may prevent instances from becoming available to run jobs. This, in turn, may lead to unexpected costs. For more information about debugging launch templates, see Using Amazon EC2 launch templates with AWS PCS.
Example network mounts
You can create file systems using Amazon EFS, Amazon FSx for Lustre, Amazon FSx for OpenZFS, and Amazon File Cache. Expand the relevant section below to see an example of each network mount.
File system setup
Create an Amazon EFS file system. Make sure it has a mount target in each Availability Zone where you will launch PCS compute node group instances. Also ensure each mount target is associated with a security group that allows inbound and outbound access from the PCS compute node group instances. For more information, see Mount targets and security groups in the Amazon Elastic File System User Guide.
Launch template
Add the security group(s) from your file system setup to the launch template you will use for the compute node group.
Include user data that uses cloud-config
mechanism to mount the Amazon EFS file
system. Replace the following values in this script with your own details:
-
– The path on a each instance where you will mount Amazon EFSmount-point-directory
-
– The file system ID for the EFS file systemfilesystem-id
MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==MYBOUNDARY==" --==MYBOUNDARY== Content-Type: text/cloud-config; charset="us-ascii" packages: - amazon-efs-utils runcmd: - mkdir -p /
mount-point-directory
- echo "filesystem-id
:/ /mount-point-directory
efs tls,_netdev" >> /etc/fstab - mount -a -t efs defaults --==MYBOUNDARY==--
File system setup
Create an FSx for Lustre file system in the VPC where you will use AWS PCS. To minimize inter-zone transfers, deploy in a subnet in the same Availability Zone where you will launch the majority of your PCS compute node group instances. Ensure the file system is associated with a security group that allows inbound and outbound access from the PCS compute node group instances. For more information on security groups, see File system access control with Amazon VPC in the Amazon FSx for Lustre User Guide.
Launch template
Include user data that uses cloud-config
to mount the FSx for Lustre file
system. Replace the following values in this script with your own details:
-
– The path on an instance where you want to mount FSx for Lustremount-point-directory
-
– The file system ID for the FSx for Lustre file systemfilesystem-id
-
– The mount name for the FSx for Lustre file systemmount-name
-
– The AWS Region where the FSx for Lustre file system is deployed (must be the same as your AWS PCS system)region-code
-
(Optional)
– Any version of Lustre supported by FSx for Lustrelatest
MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==MYBOUNDARY==" --==MYBOUNDARY== Content-Type: text/cloud-config; charset="us-ascii" runcmd: - amazon-linux-extras install -y lustre=
latest
- mkdir -p /mount-point-directory
- mount -t lustrefilesystem-id
.fsx.region-code
.amazonaws.com@tcp:/mount-name
/mount-point-directory
--==MYBOUNDARY==
File system setup
Create an FSx for OpenZFS file system in the VPC where you will use AWS PCS. To minimize inter-zone transfers, deploy in a subnet in the same Availability Zone where you will launch the majority of your AWS PCS compute node group instances. Make sure the file system is associated with a security group that allows inbound and outbound access from the AWS PCS compute node group instances. For more information on security groups, see Managing file system access with Amazon VPC in the FSx for OpenZFS User Guide.
Launch template
Include user data that uses cloud-config
to mount the root volume for an
FSx for OpenZFS file system. Replace the following values in this script with your own
details:
-
– The path on an instance where you want to mount your FSx for OpenZFS sharemount-point-directory
-
– The file system ID for the FSx for OpenZFS file systemfilesystem-id
-
– The AWS Region where the FSx for OpenZFS file system is deployed (must be the same as your AWS PCS system)region-code
MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==MYBOUNDARY==" --==MYBOUNDARY== Content-Type: text/cloud-config; charset="us-ascii" runcmd: - mkdir -p /
mount-point-directory
- mount -t nfs -o noatime,nfsvers=4.2,sync,rsize=1048576,wsize=1048576filesystem-id
.fsx.region-code
.amazonaws.com:/fsx/ /mount-point-directory
--==MYBOUNDARY==
File system setup
Create an Amazon File Cache in the VPC where you will use AWS PCS. To minimize inter-zone transfers, choose a subnet in the same Availability Zone where you will launch the majority of your PCS compute node group instances. Ensure the File Cache is associated with a security group that allows inbound and outbound traffic on port 988 between your PCS instances and the File Cache. For more information on security groups, see Cache access control with Amazon VPC in the Amazon File Cache User Guide.
Launch template
Add the security group(s) from your file system setup to the launch template you will use for the compute node group.
Include user data that uses cloud-config
to mount the
Amazon File Cache. Replace the following values in this script
with your own details:
-
– The path on an instance where you want to mount FSx for Lustremount-point-directory
-
– The Domain Name System (DNS) name for the File Cachecache-dns-name
-
– The mount name for the File Cachemount-name
MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="==MYBOUNDARY==" --==MYBOUNDARY== Content-Type: text/cloud-config; charset="us-ascii" runcmd: - amazon-linux-extras install -y lustre=2.12 - mkdir -p /
mount-point-directory
- mount -t lustre -o relatime,flockcache-dns-name
@tcp:/mount-name
/mount-point-directory
--==MYBOUNDARY==