Running containerized jobs with Pyxis - AWS ParallelCluster

Running containerized jobs with Pyxis

Learn how to create a cluster that is able to run containerized jobs using Pyxis, which is a SPANK plugin to manage containerized jobs in SLURM. Containers in Pyxis are managed by Enroot, which is tool to turn traditional container/OS images into unprivileged sandboxes. For more information, see NVIDIA Pyxis and NVIDIA Enroot.

Note

This feature is available with AWS ParallelCluster v3.11.1

When using AWS ParallelCluster, you only pay for the AWS resources that are created when you create or update AWS ParallelCluster images and clusters. For more information, see AWS services used by AWS ParallelCluster.

Prerequisites:

Create the cluster

Starting with AWS ParallelCluster 3.11.1, all official AMIs comes with Pyxis and Enroot pre-installed. In particular, SLURM is recompiled with Pyxis support and Enroot is installed as a binary in the system. However, you must to configure them according to your specific needs. The folders used by Enroot and Pyxis will have a critical impact on cluster performance. For more information, see Pyxis documentation and Enroot documentation.

For your convenience, you can find sample configurations for both Pyxis, Enroot and SPANK within /opt/parallelcluster/examples/.

To deploy a cluster using the sample configurations we have provided, complete the following tutorial.

To create the cluster with sample configuration

Pyxis and Enroot must be configured on the head node by first creating the persistent and volatile directories for Enroot, then creating the runtime directory for Pyxis, and finally enabling Pyxis as SPANK plugin in the whole cluster.

  1. Execute the below script as OnNodeConfigured custom action in the head node to configure Pyxis and Enroot on the head node.

    #!/bin/bash set -e echo "Executing $0" # Configure Enroot ENROOT_PERSISTENT_DIR="/var/enroot" ENROOT_VOLATILE_DIR="/run/enroot" sudo mkdir -p $ENROOT_PERSISTENT_DIR sudo chmod 1777 $ENROOT_PERSISTENT_DIR sudo mkdir -p $ENROOT_VOLATILE_DIR sudo chmod 1777 $ENROOT_VOLATILE_DIR sudo mv /opt/parallelcluster/examples/enroot/enroot.conf /etc/enroot/enroot.conf sudo chmod 0644 /etc/enroot/enroot.conf # Configure Pyxis PYXIS_RUNTIME_DIR="/run/pyxis" sudo mkdir -p $PYXIS_RUNTIME_DIR sudo chmod 1777 $PYXIS_RUNTIME_DIR sudo mkdir -p /opt/slurm/etc/plugstack.conf.d/ sudo mv /opt/parallelcluster/examples/spank/plugstack.conf /opt/slurm/etc/ sudo mv /opt/parallelcluster/examples/pyxis/pyxis.conf /opt/slurm/etc/plugstack.conf.d/ sudo -i scontrol reconfigure
  2. Pyxis and Enroot must be configured on the compute fleet by creating the persistent and volatile directories for Enroot and the runtime directory for Pyxis. Execute the below script as OnNodeStart custom action in compute nodes to configure Pyxis and Enroot on the compute fleet.

    #!/bin/bash set -e echo "Executing $0" # Configure Enroot ENROOT_PERSISTENT_DIR="/var/enroot" ENROOT_VOLATILE_DIR="/run/enroot" sudo mkdir -p $ENROOT_PERSISTENT_DIR sudo chmod 1777 $ENROOT_PERSISTENT_DIR sudo mkdir -p $ENROOT_VOLATILE_DIR sudo chmod 1777 $ENROOT_VOLATILE_DIR sudo mv /opt/parallelcluster/examples/enroot/enroot.conf /etc/enroot/enroot.conf sudo chmod 0644 /etc/enroot/enroot.conf # Configure Pyxis PYXIS_RUNTIME_DIR="/run/pyxis" sudo mkdir -p $PYXIS_RUNTIME_DIR sudo chmod 1777 $PYXIS_RUNTIME_DIR

Submit jobs

Now that Pyxis is configured in your cluster, you can submit containerized jobs using the sbatch and srun command, that are now enriched with container specific options.

# Submitting an interactive job srun -N 2 --container-image docker://ubuntu:22.04 hostname # Submitting a batch job sbatch -N 2 --wrap='srun --container-image docker://ubuntu:22.04 hostname'