Creating a cluster in AWS Parallel Computing Service - AWS PCS

Creating a cluster in AWS Parallel Computing Service

This topic provides an overview of available options and describes what to consider when you create a cluster in AWS Parallel Computing Service (AWS PCS). If this is your first time creating an AWS PCS cluster, we recommend you follow Get started with AWS Parallel Computing Service. The tutorial can help you create a working HPC system without expanding into all the available options and system architectures that are possible.

Prerequisites

Create an AWS PCS cluster

You can use the AWS Management Console or AWS CLI to create a cluster.

AWS Management Console
To create a cluster
  1. Open the AWS PCS console at https://console.aws.amazon.com/pcs/home#/clusters and choose Create cluster.

  2. In the Cluster setup section, enter the following fields:

    • Cluster name – A name for your cluster. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphabetic character and can't be longer than 40 characters. The name must be unique within the AWS Region and AWS account that you're creating the cluster in.

    • Scheduler – Choose a scheduler and version. For more information, see Slurm versions in AWS PCS.

    • Controller size – Choose a size for your controller. This determines how many concurrent jobs and compute nodes can be managed by the AWS PCS cluster. You can only set the controller size when the cluster is created. For more information on sizing, see Cluster size in AWS PCS.

  3. In the Networking section, select values for the following fields:

    • VPC – Choose an existing VPC that meets AWS PCS requirements. For more information, see AWS PCS VPC and subnet requirements and considerations. After you create the cluster, you can't change its VPC. If no VPCs are listed, you must create one first.

    • Subnet – All available subnets in the selected VPC are listed. Choose a subnet that meets the AWS PCS subnet requirements. For more information, see AWS PCS VPC and subnet requirements and considerations. We recommend you select a private subnet to avoid exposing your scheduler endpoints to the public internet.

    • Security groups – Specify the security group(s) that you want AWS PCS to associate with the network interfaces it creates for your cluster. You must select at least one security group that allows communication between your cluster and its compute nodes. You can select Quick create a security group to have AWS PCS create one with the necessary configuration in your selected VPC, or select an existing security group. For more information, see Security group requirements and considerations.

  4. (Optional) In the Slurm accounting configuration section, you can enable Slurm accounting and set accounting parameters. For more information, see Slurm accounting in AWS PCS.

  5. (Optional) In the Slurm configuration section, you can specify Slurm configuration options that override defaults set by AWS PCS:

    • Scale down idle time – This controls how long dynamically-provisioned compute nodes stay active after jobs placed on them complete or terminate. Setting this to a longer value can make it more likely that a subsequent job can run on the node, but may lead to increased costs. A shorter value will decrease costs, but may increase the proportion of time your HPC system spends provisioning nodes as opposed to running jobs on them.

    • Prolog – This is a fully-qualified path to a prolog scripts directory on your compute node group instances. This corresponds to the Prolog setting in Slurm. Note that this must be a directory, not a path to a specific executable.

    • Epilog – This is a fully-qualified path to an epilog scripts directory on your compute node group instances. This corresponds to the Epilog setting in Slurm. Note that this must be a directory, not a path to a specific executable.

    • Select type parameters – This helps control the resource selection algorithm used by Slurm. Setting this value to CR_CPU_Memory will activate memory-aware scheduling, while setting it to CR_CPU will activate CPU-only scheduling. This parameter corresponds to the SelectTypeParameters setting in Slurm where SelectType is set to select/cons_tres by AWS PCS.

  6. (Optional) Under Tags, add any tags to your AWS PCS cluster.

  7. Choose Create cluster. The Status field shows Creating while the AWS PCS creates the cluster. This process can take several minutes.

Important

There can only be 1 cluster in a Creating state per AWS Region per AWS account. AWS PCS returns an error if there is already a cluster in a Creating state when you try to create a cluster.

AWS CLI
To create a cluster
  1. Create your cluster with the command that follows. Before running the command, make the following replacements:

    • Replace region with the ID of the AWS Region that you want to create your cluster in, such as us-east-1.

    • Replace my-cluster with a name for your cluster. The name can contain only alphanumeric characters (case-sensitive) and hyphens. It must start with an alphabetic character and can't be longer than 40 characters. The name must be unique within the AWS Region and AWS account where you're creating the cluster.

    • Replace 24.11 with any supported version of Slurm.

      Note

      AWS PCS currently supports Slurm 24.11 and 24.05.

    • Replace SMALL with any supported cluster size. This determines how many concurrent jobs and compute nodes can be managed by the AWS PCS cluster. It can only be set when the cluster is created. For more information on sizing, see Cluster size in AWS PCS.

    • Replace the value for subnetIds with your own. We recommend you select a private subnet to avoid exposing your scheduler endpoints to the public internet.

    • Specify the securityGroupIds that you want AWS PCS to associate with the network interfaces it creates for your cluster. The security groups must be in the same VPC as the cluster. You must select at least one security group that allows communication between your cluster and its compute nodes. For more information, see Security group requirements and considerations.

    • Optionally, you can provide a custom KMS key to encrypt your controller’s data using --kms-key-id kms-key. Replace kms-key with an existing KMS ARN, key ID, or alias. Note that the account used to create the cluster must have kms:Decrypt privileges on the custom KMS key.

    aws pcs create-cluster --region region \ --cluster-name my-cluster \ --scheduler type=SLURM,version=24.11 \ --size SMALL \ --networking subnetIds=subnet-ExampleId1,securityGroupIds=sg-ExampleId1
    • Optionally, you can add the --slurm-configration option to customize the Slurm behavior and specify Slurm configuration options. The following example sets the scale-down idle time to 60 minutes (3600 seconds), enables Slurm accounting, and specifies slurm.conf settings as the value for slurmCustomSettings. For more information, see Slurm accounting in AWS PCS.

      Note

      Accounting is supported for Slurm 24.11 or later.

      aws pcs create-cluster --region region \ --cluster-name my-cluster \ --scheduler type=SLURM,version=24.11 \ --size SMALL \ --networking subnetIds=subnet-ExampleId1,securityGroupIds=sg-ExampleId1 --slurm-configuration scaleDownIdleTimeInSeconds=3600,accounting='{mode=STANDARD}',slurmCustomSettings='[{parameterName=SelectTypeParameters,parameterValue=CR_CPU_Memory}]'
  2. It can take several minutes to provision the cluster. You can query the status of your cluster with the following command. Don’t proceed to creating queues or compute node groups until the cluster’s status field is ACTIVE.

    aws pcs get-cluster --region region --cluster-identifier my-cluster
Important

There can only be 1 cluster in a Creating state per AWS Region per AWS account. AWS PCS returns an error if there is already a cluster in a Creating state when you try to create a cluster.

Recommended next steps for your cluster
  • Add compute node groups.

  • Add queues.

  • Enable logging.