Change | Description | Date |
---|
AWS ParallelCluster UI version 2024.04.0 released | AWS ParallelCluster UI version 2024.04.0 released. | April 17, 2024 |
AWS ParallelCluster version 3.9.1 released | We're excited to announce the release of AWS ParallelCluster 3.9.1 To upgrade, enter the following: sudo pip install --upgrade
aws-parallelcluster Bug fixes | April 11, 2024 |
AWS ParallelCluster version 3.9.1 released | We're excited to announce the release of AWS ParallelCluster 3.9.1 To upgrade, enter the following: sudo pip install --upgrade
aws-parallelcluster Bug fixes | April 11, 2024 |
AWS ParallelCluster UI version 2024.03.0 released | AWS ParallelCluster UI version 2024.03.0 released. For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub. | March 12, 2024 |
AWS ParallelCluster version 3.9.0 released | We're excited to announce the release of AWS ParallelCluster 3.9.0
To upgrade, enter the following: sudo pip install --upgrade
aws-parallelcluster
Enhancements:
-
Add the configuration parameter DeploymentSettings/DefaultUserHome
to allow users to move the default user's home directory to /local/home
instead of /home (default).
-
Permit to update MinCount , MaxCount ,
Queue and ComputeResource configuration parameters
without the need to stop the compute fleet. It's now possible to update them by
setting Scheduling/SlurmSettings/QueueUpdateStrategy to TERMINATE.
AWS ParallelCluster will terminate only the nodes removed during a resize of the
cluster capacity performed through a cluster update.
-
Permit to update the external shared storage of type Efs, FsxLustre, FsxOntap,
FsxOpenZfs and FileCache without replacing the compute and login fleet.
-
Add support for RHEL9.
-
Add support for Rocky Linux 9 as CustomAmi created through
build-image process. No public official AWS ParallelCluster Rocky9
Linux AMI is made available at this time.
-
Remove CommunicationParameters from the Custom Slurm Settings
deny list.
-
Add DeploymentSettings/DisableSudoAccessForDefaultUser parameter to
disable sudo access of default user in supported OSes.
-
Changes to FSx for Lustre file systems created by ParallelCluster: Change the
Lustre server version to 2.15.
-
Add possibility to choose between Open and Closed Source Nvidia Drivers when
building an AMI, through the ['cluster']['nvidia']['kernel_open']
cookbook node attribute.
-
* Add a clustermgtd config option ec2_instance_missing_max_count to
allow a configurable amount of retries for eventual EC2 describe instances
consistency with run instances.
Changes
-
Upgrade Slurm to 23.11.4 (from 23.02.7).
-
Upgrade NVIDIA driver to version 535.154.05.
-
Add support for Python 3.11, 3.12 in pcluster CLI and
aws-parallelcluster-batch-cli.
-
Build network interfaces using network card index from
NetworkCardIndex list of EC2 DescribeInstances response, instead of
looping over MaximumNetworkCards range.
-
Fail cluster creation when using instance types P3, G3, P2 and G2 because their
GPU architecture is not compatible with Open Source Nvidia Drivers (OpenRM)
introduced as part of 3.8.0 release.
-
Upgrade third-party cookbook dependencies: nfs-5.1.2 (from nfs-5.0.0)
-
Upgrade EFA installer to 1.30.0.
-
Efa-driver: efa-2.6.0-1
-
Efa-config: efa-config-1.15-1
-
Efa-profile: efa-profile-1.6-1
-
Libfabric-aws: libfabric-aws-1.19.0
-
Rdma-core: rdma-core-46.0-1
-
Open MPI: openmpi40-aws-4.1.6-2 and
openmpi50-aws-5.0.0-11
-
Upgrade NICE DCV to version 2023.1-16388.
Bug fixes
-
Fix issue making job fail when submitted as active directory user from login
nodes. The issue was caused by an incomplete configuration of the integration with
the external Active Directory on the head node.
-
Refactor IAM policies defined in CloudFormation template
parallelclutser-policies.yaml to prevent ParallelCluster API deployment failure
caused by policies exceeding IAM limits.
-
Fix issue making login nodes fail to bootstrap when the head node takes more
time than expected in writing keys.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster-ui package on GitHub. | March 5, 2024 |
AWS ParallelCluster UI version 2024.02.0 released | AWS ParallelCluster UI version 2024.02.0 released Changes: For details of the changes, see the CHANGELOG files for the aws-parallelcluster-ui package on GitHub. | February 8, 2024 |
AWS ParallelCluster UI version 2023.12.0 released | AWS ParallelCluster UI version 2023.12.0 released.
Features:
-
Added support for PCUI deployment with private networking.
-
Added possibility to optionally apply a Permissions Boundary to every IAM role
created by the PCUI and PCAPI infrastructures
-
Added possibility to optionally apply a prefix to every IAM role and policy
created by the PCUI and PCAPI infrastructure.
-
Added support for ParallelCluster version 3.8.0, without feature parity in the
wizard.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster-ui package on GitHub. | December 21, 2023 |
AWS ParallelCluster version 3.8.0 released | AWS ParallelCluster version 3.8.0 released.
Enhancements:
-
Add support for EC2 Capacity Blocks for ML.
-
Add support for Rocky Linux 8 as CustomAmi created through
build-image process. No public official AWS ParallelCluster Rocky8
Linux AMI is made available at this time.
-
Add Scheduling/ScalingStrategy parameter to control the cluster
scaling strategy to use when launching EC2 instances for Slurm compute nodes.
Possible values are all-or-nothing , greedy-all-or-nothing ,
best-effort , with all-or-nothing being the
default.
-
Add HeadNode/SharedStorageType parameter to use EFS storage instead
of NFS exports from the head node root volume for intra-cluster shared file system
resources: ParallelCluster, Intel, Slurm, and /home data. This
enhancement reduces the load on the head node networking.
-
Allow for mounting /home as an EFS or FSx external shared storage
via the SharedStorage section of the config file.
-
Add new parameter SlurmSettings/MungeKeySecretArn to permit to use
an external user-defined MUNGE key from AWS Secrets Manager.
-
Add Monitoring/Alarms/Enabled parameter to toggle Amazon CloudWatch
Alarms for the cluster.
-
Add head node alarms to monitor EC2 health checks, CPU utilization and the
overall status of the head node, and add them to the CloudWatch Dashboard created
with the cluster.
-
Add support for Data Repository Associations when using
PERSISTENT_2 as DeploymentType for a managed FSx for
Lustre.
-
Add Scheduling/SlurmSettings/Database/DatabaseName parameter to
allow users to specify a custom name for the database on the database server to be
used for Slurm accounting.
-
Make InstanceType an optional configuration parameter when
configuring CapacityReservationTarget/CapacityReservationId in the
compute resource.
-
Add possibility to specify a prefix for IAM roles and policies created by
AWS ParallelCluster API.
-
Add possibility to specify a permissions boundary to be applied for IAM roles
and policies created by AWS ParallelCluster API.
Changes
-
Upgrade Slurm to 23.02.7 (from 23.02.6).
-
Upgrade NVIDIA driver to version 535.129.03.
-
Upgrade CUDA Toolkit to version 12.2.2.
-
Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux
instead of NVIDIA closed source module.
-
Remove support of all_or_nothing_batch configuration parameter in
the Slurm resume program, in favor of the new
Scheduling/ScalingStrategy cluster configuration.
-
Changed cluster alarms naming convention to
'[cluster-name]-[component-name]-[metric]'.
-
Change default EBS volume types in ADC regions from gp2 to gp3, for both the
root and additional volumes.
-
The optional permissions boundary for the AWS ParallelCluster API is now applied
to every IAM role created by the API infrastructure.
-
Upgrade EFA installer to 1.29.1 .
-
Efa-driver: efa-2.6.0-1
-
Efa-config: efa-config-1.15-1
-
Efa-profile: efa-profile-1.5-1
-
Libfabric-aws: libfabric-aws-1.19.0-1
-
Rdma-core: rdma-core-46.0-1
-
Open MPI: openmpi40-aws-4.1.6-1
-
Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where
version 2.3.1 is used.
-
Upgrade aws-cfn-bootstrap to version 2.0-28.
-
Add support for Python 3.10 in aws-parallelcluster-batch-cli.
Bug fixes
-
Fix inconsistent scaling configuration after cluster update rollback when
modifying the list of instance types declared in the Compute Resources.
-
Fix users SSH keys generation when switching users without root privilege in
clusters integrated with an external LDAP server through cluster configuration
files.
-
Fix disabling Slurm power save mode when setting ScaledownIdletime =
-1 .
-
Fix hard-coded path to Slurm installation dir in
update_slurm_database_password.sh script for Slurm Accounting.
| December 19, 2023 |
AWS ParallelCluster version 3.7.2 released | AWS ParallelCluster version 3.7.2 released.
| October 25, 2023 |
AWS ParallelCluster UI version 2023.10.0 released | AWS ParallelCluster UI version 2023.10.0 released.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster-ui package on GitHub. | October 20, 2023 |
AWS ParallelCluster version 3.7.1 released | AWS ParallelCluster version 3.7.1 released.
Changes:
-
Upgrade Slurm to 23.02.5 (from 23.02.4).
-
Upgrade EFA installer to 1.26.1 , fixing RDMA writedata issue in
P5.
-
Efa-driver: efa-2.5.0-1 .
-
Efa-config: efa-config-1.15-1 .
-
Efa-profile: efa-profile-1.5-1 .
-
Libfabric-aws: libfabric-aws-1.18.2-1 .
-
ERdma-core: rdma-core-46.0-1 .
-
Open MPI: openmpi40-aws-4.1.5-4 .
| September 22, 2023 |
AWS ParallelCluster version 3.7.0 released | AWS ParallelCluster version 3.7.0 released.
Enhancements:
-
Support configuration of static and dynamic node priorities in compute resources
by using a AWS ParallelCluster configuration YAML file.
-
Add support for Ubuntu 22. RSA keys are not supported by default.
-
Add the queue configuration setting JobExclusiveAllocation to
allocate nodes in a partition exclusively to a single job at any given time.
-
Allow Override aws-parallelcluster-node package at cluster create
and cluster update time. For the head node, this applies for cluster update. Useful
for development purposes only.
-
Avoid NFS server start on compute nodes.
-
Add support for log-in nodes.
-
Allow memory-based scheduling when multiple instance types are specified for a
Slurm Compute Resource.
-
Add support to mount existing Amazon File Cache as shared storage.
Changes:
-
Assign Slurm dynamic nodes a priority (weight) of 1000 by default. By doing
this, Slurm can prioritize idle static nodes over idle dynamic nodes.
-
Make aws-parallelcluster-node daemons only handle AWS ParallelCluster
managed Slurm partitions.
-
Increase EFS-utils watchdog poll interval to 10 seconds. This
change applies when EncryptionInTransit is set to true ,
which is the only condition that causes the watchdog to run.
-
Upgrade the EFA installer to 1.25.1 .
-
Efa-driver: efa-2.5.0-1 (from efa-2.1.1g )
-
Efa-config: efa-config-1.15-1 (from
efa-config-1.13-1 )
-
Efa-profile: efa-profile-1.5-1 (no change)
-
Libfabric-aws: libfabric-aws-1.18.1-0 (from
libfabric-aws-1.17.1-1 )
-
Rdma-core: rdma-core-46.0-1 (from
rdma-core-43.0-1 )
-
Open MPI: openmpi40-aws-4.1.5-4 (from
openmpi40-aws-4.1.5-1 )
-
Upgrade Slurm to version 23.02.4.
-
Change the default value of Imds/ImdsSupport from v1.0 to v2.0.
-
Deprecate Ubuntu 18.
-
Update the default root volume size to 40 GB to account for limits on Centos
7.
-
Restrict permission on file /tmp/wait_condition_handle.txt within the head node
so that only root can read it.
-
Create a Slurm partition-nodelist mapping JSON file to be used by the node
package daemons to recognize PC-managed Slurm partitions and nodelists.
-
Upgrade NVIDIA driver to version 535.54.03.
-
Upgrade CUDA library to version 12.2.0.
-
Upgrade NVIDIA Fabric manager to nvidia-fabricmanager-535.
-
Upgrade ARM PL to version 23.04.1 for Ubuntu 22.04 only.
-
Upgrade NICE DCV to version 2023.0-15487 .
Bug fixes:
-
Add validation to the ScaledownIdletime value, to prevent setting a
value lower than -1.
-
Fix cluster create failure with Ubuntu Deep Learning AMI on GPU instances with
DCV enabled.
-
Fix issue causing dangling IAM policies to be created when creating
ParallelCluster CloudFormation custom resource provider with
CustomLambdaRole.
-
Fix an issue that was causing misalignment of compute nodes DNS name on
instances with multiple network interfaces, when using
SlurmSettings/Dns/UseEc2Hostnames equals to True
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | August 30, 2023 |
Documentation only release | AWS ParallelCluster version 3 specific user guide published.
Documentation only release:
| July 17, 2023 |
AWS ParallelCluster version 3.6.1 released | AWS ParallelCluster version 3.6.1 released.
Bug fixes:
-
Remove hard coding of root volume device name (/dev/sda1
and /dev/xvda ) and retrieve it from the AMI(s) used during
create-cluster .
-
Fix cluster create failure when using CloudFormation custom resource with
ElasticIp set to True .
-
Fix cluster create and update failures when using a AWS CloudFormation custom resource with
large configuration files.
-
Fix an issue that prevented ptrace protection from being disabled
on Ubuntu and that didn't permit Cross Memory Attach (CMA) in libfabric.
-
Fix fast insufficient capacity fail-over logic when using multiple instance
types and no instances are returned.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | July 5, 2023 |
AWS ParallelCluster UI version 2023.06.0 released | AWS ParallelCluster UI version 2023.06.0 released.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster-ui package on GitHub. | June 7, 2023 |
AWS ParallelCluster version 3.6.0 released | AWS ParallelCluster version 3.6.0 released.
Enhancements:
-
Add support for RHEL8.
-
Add an AWS CloudFormation custom resource for
creating and managing clusters with CloudFormation.
-
Add support for customizing the
cluster Slurm configuration in the AWS ParallelCluster configuration YAML
file.
-
Build Slurm with support for LUA.
-
Increase the limit on the maximum number of queues per cluster from 10 to 50.
Each queue can have up to 50 compute resources. Each cluster can have up to 50
compute resources.
-
Add support for specifying a sequence of multiple custom action scripts for an event
configured in OnNodeStart , OnNodeConfigured , and
OnNodeUpdated parameters.
-
Add new configuration section HealthChecks / Gpu , for
applying GPU health checks on a compute node before a job is run.
-
Add support for Tags in the SlurmQueues and
SlurmQueues / ComputeResources configuration.
-
Add support for DetailedMonitoring in the Monitoring
configuration.
-
Add mem_used_percent and disk_used_percent metrics for
head node memory and root volume disk utilization tracking in the AWS ParallelCluster
CloudWatch dashboard, and set up alarms
for monitoring these metrics.
-
Add log rotation support for
AWS ParallelCluster managed logs.
-
Track common compute node errors and dynamic node longest idle time in the CloudWatch Dashboard.
-
Enforce the DCV Authenticator Server to use at least TLS-1.2
protocol when creating the SSL Socket.
-
Install the NVIDIA Data Center
GPU Manager (DCGM) package on all supported operating systems except
aarch64
centos7 and alinux2 .
-
Load the kernel module nvidia-uvm by default to provide Unified Virtual Memory (UVM)
functionality to the CUDA driver.
-
Install the NVIDIA
Persistence Daemon as a system service.
Changes:
-
Upgrade Slurm to version 23.02.2 (from version
22.05.8 ).
-
Upgrade munge to version 0.5.15 (from version
0.5.14 ).
-
Set the Slurm TreeWidth to 30.
-
Set the Slurm prolog and epilog configurations to
target directory /opt/slurm/etc/scripts/prolog.d/ and
/opt/slurm/etc/scripts/epilog.d/ respectively.
-
Set Slurm BatchStartTimeout to 3 minutes maximum for running
Prolog scripts during compute node registration.
-
Increase the default RetentionInDays of CloudWatch logs from 14 to 180
days.
-
Upgrade the EFA installer to 1.22.1 .
-
Dkms: 2.8.3-2
-
Efa-driver: efa-2.1.1g (no change)
-
Efa-config: efa-config-1.13-1 (no change)
-
Efa-profile: efa-profile-1.5-1 (no change)
-
Libfabric-aws: libfabric-aws-1.17.1-1 (from
libfabric-aws-1.17.0-1 )
-
Rdma-core: rdma-core-43.0-1 (no change)
-
Open MPI: openmpi40-aws-4.1.5-1 (no change)
-
Upgrade the Lustre client version to 2.12 on Amazon Linux 2. Lustre
client 2.12 has been installed on Ubuntu 20.04, 18.04, and CentOS >=
7.7.
-
Upgrade the Lustre client version to 2.10.8 on CentOS 7.6.
-
Upgrade the NVIDIA driver to version 470.182.03 (from version
470.141.03 ).
-
Upgrade the NVIDIA Fabric Manager to version 470.182.03 (from
version 470.141.03 ).
-
Upgrade the NVIDIA CUDA Toolkit to version 11.8.0 (from version
11.7.1 ).
-
Upgrade the NVIDIA CUDA sample to version 11.8.0 .
-
Upgrade the Intel MPI Library to Version 2021 Update 9 (from Version 2021 Update
6). For more information, see IntelĀ® MPI Library 2021 Update 9.
-
Upgrade NICE DCV to version 2023.0-15022 (from version
2022.2-14521 ).
-
server: 2023.0.15022-1 (from version
2022.2-14521-1 ).
-
xdcv: 2023.0.547-1 (from version
2022.2.519-1 ).
-
gl: 2023.0.1027-1 (from version
2022.2.1012-1 ).
-
web_viewer: 2023.0.15022-1 (from version
2022.2.14521-1 ).
-
Upgrade aws-cfn-bootstrap to version 2.0-24 .
-
Upgrade image used by the CodeBuild environment when building container images
for AWS Batch clusters:
-
aws/codebuild/amazonlinux2-x86_64-standard:4.0 (from
aws/codebuild/amazonlinux2-x86_64-standard:3.0 ).
-
aws/codebuild/amazonlinux2-aarch64-standard:2.0 (from
aws/codebuild/amazonlinux2-aarch64-standard:1.0 ).
Bug fixes:
-
Fix Amazon EFS and Amazon FSx network security group validators to avoid reporting false
errors.
-
Fix missing tagging of resources created by Image Builder during the
build-image operation.
-
Fix update policy for MaxCount to always perform numerical
comparisons on the MaxCount property.
-
Fix IP alignment on compute node instances with multiple network cards.
-
Fix replacement of StoragePass in the
slurm_parallelcluster_slurmdbd.conf when a queue parameter update is
performed and the Slurm accounting configurations are not updated.
-
Fix issue that causes dangling security groups to be created when creating a
cluster with an existing EFS file system.
-
Fix issue causing the cfn-hup daemon to fail when it gets
restarted.
-
Consider dynamic nodes with INVALID_REG flag as bootstrap failures
for Slurm protected mode. Static nodes failing Slurm registration are already
treated as bootstrap failures after the
node_replacement_timeout .
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | May 22, 2023 |
AWS ParallelCluster UI version 2023.05.0 released | AWS ParallelCluster UI version 2023.05.0 released.
Enhancements:
-
Starting with AWS ParallelCluster version 3.6.0, add support for RHEL 8.
-
Add cluster cost monitoring.
-
Starting with AWS ParallelCluster version 3.6.0, increase queue and compute
resource quotas.
Changes:
-
Improved the cluster creation wizard user interface.
-
Increased the speed of AWS ParallelCluster UI deployment.
-
Improved the interface for adding a new user.
-
Queues are in the head node subnet by default.
Bug fixes:
-
Switch to the correct region after cluster creation completes.
-
Fix the loading indicator display in the "Edit cluster" feature.
-
Fix cluster creation when the EBS SnapshotId property is removed.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster-ui package on GitHub. | May 16, 2023 |
AWS ParallelCluster UI version 2023.04.0 released | AWS ParallelCluster UI version 2023.04.0 released.
Enhancements:
-
Cluster create wizard re-design.
-
Cluster logs page re-design.
-
Add custom name setting for shared storage.
-
Add multiple storage selection when adding storage to a cluster.
-
Add DeletionPolicy support for Amazon EFS and FSx for Lustre.
-
Add ImdsSupport setting in cluster configuration.
-
Add support for C7 instance types.
-
Added tutorial Reverting to a previous AWS Systems Manager document version.
Changes:
-
Cluster configuration YAML up to 1MB in size.
-
User isn't logged out due to an authorization with Boto3 IAM temporary
credentials.
-
Disabled multi-threading options when an HPC instance is selected.
-
Removed disable rollback on cluster create page.
-
User is prevented from using the AWS ParallelCluster UI until the required information is
provided.
-
Up to 10 queues can be added.
-
The SSM-SessionManagerRunShell document is not overwritten during
AWS ParallelCluster UI installation.
Bug fixes:
-
Fix broken reset password link.
-
Fix broken delete stack caused by EcrPrivateRepository
not being empty
-
Fixed initialization issue of the Generate SSH Keys check-box in Multiple user
management properties section.
-
Fixed crash caused be a job with undefined properties.
-
Fixed SCRATCH FSx settings.
-
Fixed Start and Stop instances button, still enabled after being clicked
once.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster-ui package on GitHub. | April 17, 2023 |
AWS ParallelCluster version 3.5.1 released | AWS ParallelCluster version 3.5.1 released.
Changes:
-
Upgrade EFA installer to 1.22.0 .
-
Efa-driver: efa-2.1.1g (from efa-2.1.1-1 )
-
Efa-config: efa-config-1.13-1 (from efa-config-1.12-1)
-
Efa-profile: efa-profile-1.5-1 (no change)
-
Libfabric-aws: libfabric-aws-1.17.0-1 (from
libfabric-aws-1.16.1amzn3.0-1 )
-
Rdma-core: rdma-core-43.0-1 (no change)
-
Open MPI: openmpi40-aws-4.1.5-1 (from
openmpi40-aws-4.1.4-3 )
Upgrade NICE DCV to version 2022.2-14521 .
Bug fixes:
-
Fix potential node launch failures caused by pattern matching between
MountDir and /etc/exports when removing shared Amazon EBS
volumes as part of a cluster update.
-
Fix to prevent compute_console_output log file truncation at every
clustermgtd iteration.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | March 29, 2023 |
AWS ParallelCluster version 3.5.0 released | AWS ParallelCluster version 3.5.0 released.
Enhancements:
-
Access and manage clusters with the AWS ParallelCluster UI.
-
Add versioned AWS ParallelCluster policies in a CloudFormation template that you can
reference in your workloads.
-
Add a AWS ParallelCluster Python library that you can use with your own
code.
-
Add logging of compute node console output to Amazon CloudWatch on compute node bootstrap
failure.
-
Add failures field containing failure code and reason to
describe-cluster output when cluster creation fails.
-
Add validators to prevent malicious string injection while calling the
subprocess module.
-
Fail cluster creation if cluster status changes to PROTECTED while
provisioning static nodes.
Bug fixes:
-
Fix cluster database creation by verifying that the cluster name is not longer
than 40 characters when Slurm accounting is enabled.
-
Fix an issue in clustermgtd that caused compute nodes, rebooted
through Slurm, to be replaced if the EC2 instance status checks fail.
-
Fix an issue that prevented compute nodes, with capacity reservations shared by
other accounts, from launching because of an incorrect IAM policy on the head
node.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, aws-parallelcluster-node, and aws-parallelcluster-ui packages on GitHub. | February 20, 2023 |
AWS ParallelCluster version 3.4.1 released | AWS ParallelCluster version 3.4.1 released.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | January 13, 2023 |
AWS ParallelCluster version 3.4.0 released | AWS ParallelCluster version 3.4.0 released.
Enhancements:
-
Add support for launching nodes across multiple availability zones to increase
capacity availability.
-
Add support for specifying multiple subnets for each queue to increase capacity
availability.
-
Add new configuration parameter in Iam / ResourcePrefix to specify a prefix for path and name of
IAM resources created by AWS ParallelCluster.
-
Add new configuration section DeploymentSettings / LambdaFunctionsVpcConfig for specifying the Vpc config used
by AWS ParallelCluster Lambda functions.
-
Add the ability to specify a custom script to run in the head node during a
cluster update. The script can be specified with HeadNode / CustomActions / OnNodeUpdated when using Slurm as scheduler.
Changes:
-
Remove creation of Amazon EFS mount targets for existing file systems.
-
Mount EFS file systems using amazon-efs-utils . EFS files systems
can be mounted using in-transit encryption and an IAM authorized user.
-
Install stunnel 5.67 on CentOS7 and Ubuntu to support EFS in-transit
encryption.
-
Upgrade EFA installer to 1.20.0 (from
1.18.0 ).
-
Efa-driver: efa-2.1 (from efa-1.16.0-1 )
-
Efa-config: efa-config-1.11-1 (no change)
-
Efa-profile: efa-profile-1.5-1 (no change)
-
Libfabric-aws: libfabric-aws-1.16.1 (from
libfabric-aws-1.16.0~amzn4.0-1 )
-
Rdma-core: rdma-core-43.0-2 from
(rdma-core-41.0-2 )
-
Open MPI: openmpi40-aws-4.1.4-3 from
(openmpi40-aws-4.1.4-2 )
-
Upgrade Slurm to version 22.05.7 (from
22.05.5 ).
-
Upgrade Python to 3.9.16 and 3.7.16 . (from
3.9.15 and 3.7.13 ).
-
With Slurm 22.05.7 , dynamic nodes in
IDLE+CLOUD+COMPLETING+POWER_DOWN+NOT_RESPONDING status aren't
considered unhealthy.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | December 22, 2022 |
AWS ParallelCluster version 3.3.1 released | AWS ParallelCluster version 3.3.1 released.
Changes:
-
Official AWS ParallelCluster product AMIs are now available after Amazon EC2
deprecation at two years.
-
Increase memory size of the AWS ParallelCluster API Lambda to 2048 in order to
reduce cold start penalties and avoid timeouts.
Bug fixes:
-
Prevent replacement of managed FSx for Lustre file systems and loss of data on
cluster updates that include changes to the compute fleet subnet ID.
-
SharedStorage
DeletionPolicy applies to cluster update actions.
For details of the changes, see the CHANGELOG file for the
aws-parallelcluster package on GitHub. | December 2, 2022 |
AWS ParallelCluster documentation only hpc6id note | AWS ParallelCluster documentation-only update
| December 2, 2022 |
AWS ParallelCluster version 3.1.5 released | AWS ParallelCluster version 3.1.5 released.
Changes:
-
Add lambda:ListTags and lambda:UntagResource to the
ParallelClusterUserRole used by the AWS ParallelCluster API stack for a
cluster update.
-
Upgrade Intel MPI Library to Version 2021 Update 6 (from Version 2021 Update 4).
For more information, see IntelĀ® MPI Library 2021 Update 6.
-
Upgrade NVIDIA driver to version 470.141.03 (from 470.103.01).
-
Upgrade NVIDIA Fabric Manager to version 470.141.03 (from 470.103.01).
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | November 16, 2022 |
AWS ParallelCluster version 3.3.0 released | AWS ParallelCluster version 3.3.0 released.
Enhancements:
-
Add support for multiple instance allocation configuration for a compute
resource when using Slurm as a scheduler. For more information, see Multiple instance type allocation with Slurm.
-
Add support for adding and removing SharedStorage with a cluster update, using an updated
configuration. For more information, see Shared storage.
-
Add new configuration parameter DeletionPolicy for Efs and FsxLustre shared
storage settings to support storage retention.
-
Add support for Slurm accounting with new configuration parameter Scheduling / SlurmSettings / Database. For
more information, see Slurm accounting with AWS ParallelCluster.
-
Add support for on-demand capacity reservations and capacity reservation
resource groups. For more information, see Launch instances with ODCR (On-Demand Capacity Reservations).
-
Add new configuration parameter to specify the IMDS version to support in a
cluster or build image infrastructure in the cluster, Imds / ImdsSupport, and
build, Imds / ImdsSupport,
configurations.
-
Add support for Networking / PlacementGroup in the SlurmQueues / ComputeResources section.
-
Add support for instances with multiple network interfaces that are limited to
only one ENI per device.
-
Improve validation of networking for external Amazon EFS file systems by checking the
CIDR block in the attached security group.
-
Add validator to check if configured instance types support placement
groups.
-
Configure NFS threads to be min(256, max(8, num_cores * 4)) to ensure better
stability and performance.
-
Move NFS installation at build time to reduce configuration time.
-
Enable server-side encryption for the EcrImageBuilder SNS topic that's created
when deploying AWS ParallelCluster API and is used to notify on docker image build
events.
Changes:
-
Change the behavior of SlurmQueues / Networking /
PlacementGroup / Enabled . It now creates a
unique managed placement group for each compute resource instead of a single managed
placement group for all compute resources.
-
Add support for SlurmQueues / Networking /
PlacementGroup / Name as the preferred naming method.
-
Move head node tags from Launch Template to instance definition to avoid head
node replacement on tags updates.
-
Disable multithreading through script executed by cloud-init and
not through CpuOptions set in the Launch Template.
-
Upgrade Python to version 3.9 and NodeJS to version 16 in the API
infrastructure, API Docker container, and cluster Lambda resources.
-
Remove support for Python 3.6 in
aws-parallelcluster-batch-cli .
-
Upgrade Slurm to version 22.05.5 (from
21.08.8-2 ).
-
Upgrade NVIDIA driver to version 470.141.03 (from
470.129.06 ).
-
Upgrade NVIDIA Fabric Manager to version 470.141.03 (from
470.129.06 ).
-
Upgrade NVIDIA CUDA Toolkit to version 11.7.1 (from 11.4.4 ).
-
Upgrade Python used in AWS ParallelCluster virtualenvs from 3.7.13 to
3.9.15 .
-
Upgrade EFA installer to version 1.18.0.
-
Efa-driver: efa-1.16.0-1 (no change)
-
Efa-config: efa-config-1.11-1 (from
efa-config-1.10-1 )
-
Efa-profile: efa-profile-1.5-1 (no change)
-
Libfabric-aws: libfabric-aws-1.16.0~amzn4.0-1 (from
libfabric-aws-1.16.0~amzn2.0-1 ).
-
Rdma-core: rdma-core-41.0-2 (from
rdma-core-37.0 )
-
Open MPI: openmpi40-aws-4.1.4-2 (from
openmpi40-aws-4.1.1-2 )
-
Upgrade NICE DCV to version 2022.1-13300 (from
2022.0-12760 ).
-
Enable suppression of the SingleSubnetValidator for
Queues .
-
Do not replace DRAIN nodes when nodes are in
COMPLETING state as Epilog may be still running.
Bug fixes:
-
Fix validation of filters parameter in the AWS ParallelCluster
ListClusterLogStreams command to fail when incorrect filters are
passed.
-
Fix validation of parameter SharedStorage / EfsSettings to fail validation when
FileSystemId is specified along with other SharedStorage / EfsSettings parameters. Previously,
FileSystemId wasn't included.
-
Fix cluster update when changing the order of SharedStorage together with other changes in the
configuration.
-
Fix UpdateParallelClusterLambdaRole in the AWS ParallelCluster API to
upload logs to CloudWatch.
-
Fix Cinc not using the local CA certificates bundle when installing packages
before any cookbooks are executed.
-
Fix a hang in upgrading ubuntu with pcluster build-image when
Build:UpdateOsPackages:Enabled:true is set.
-
Fix parsing of YAML cluster configuration by failing on duplicate keys.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | November 2, 2022 |
AWS ParallelCluster documentation only API reference added. | AWS ParallelCluster documentation-only update
| October 27, 2022 |
AWS ParallelCluster version 3.2.1 released | AWS ParallelCluster version 3.2.1 released.
Changes:
-
Upgrade NVIDIA driver to version 470.141.03.
-
Upgrade NVIDIA Fabric Manager to version 470.141.03.
-
Disable cron job tasks man-db and
mlocate , which may have a negative impact on node performance.
-
Upgrade Intel MPI Library to 2021.6.0.602.
-
Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | October 3, 2022 |
AWS ParallelCluster version 3.2.0 released | AWS ParallelCluster version 3.2.0 released.
Changes:
-
Upgrade EFA installer to version 1.17.2.
-
EFA driver: efa-1.16.0-1
-
EFA configuration: efa-config-1.10-1
-
EFA profile: efa-profile-1.5-1
-
Libfabric: libfabric-aws-1.16.0~amzn2.0-1
-
RDMA core: rdma-core-41.0-2
-
Open MPI: openmpi40-aws-4.1.4-2
-
Upgrade NICE DCV to version 2022.0-12760.
-
Upgrade NVIDIA driver to version 470.129.06.
-
Upgrade NVIDIA Fabric Manager to version 470.129.06.
-
Change default EBS volume types from gp2 to gp3 in both the root and additional
volumes.
-
Changes to FSx for Lustre file systems created by AWS ParallelCluster:
-
Doesn't require PlacementGroup / Enabled to be set to true when passing an
existing PlacementGroup / Id .
-
Doesn't allow setting PlacementGroup / Id when
PlacementGroup / Enabled is explicitly set to
false .
-
Add parallelcluster:cluster-name tag to all resources created by
AWS ParallelCluster.
-
Add lambda:ListTags and lambda:UntagResource to
ParallelClusterUserRole used by the AWS ParallelCluster API stack for
cluster update.
-
Restrict IPv6 access to IMDS to root and cluster admin users only,
when configuration parameter HeadNode / Imds /
Secured is enabled.
-
With a custom AMI, use the AMI root volume size instead of the ParallelCluster
default of 35 GiB. The value can be changed in cluster configuration file.
-
Automatic disabling of the compute fleet when the configuration parameter
Scheduling / SlurmQueues / ComputeResources
/ SpotPrice is lower than the minimum required Spot request fulfillment
price.
-
Show requested_value and current_value values in the
change set when adding or removing a section during an update.
-
Disable aws-ubuntu-eni-helper service, available in Deep Learning
AMIs, to avoid conflicts with configure_nw_interface.sh when
configuring instances with multiple network cards.
-
Remove support for Python 3.6.
-
Set MTU to 9001 for all the network interfaces when configuring instances with
multiple network cards.
-
Remove the trailing dot when configuring the compute node FQDN.
-
Manage static nodes in POWERING_DOWN .
-
Doesn't replace dynamic node in POWER_DOWN as jobs may be still
running.
-
Restart clustermgtd and slurmctld daemons at cluster
update time only when Scheduling parameters are updated in the cluster
configuration.
-
Update slurmctld and slurmd
systemd service files.
-
Restrict IPv6 access to IMDS to root and cluster admin users only, when
configuration parameter HeadNode / Imds /
Secured is enabled.
-
Set Slurm configuration AuthInfo=cred_expire=70 to reduce the time
requeued jobs must wait before starting again when nodes are not available.
-
Upgrade third-party cookbook dependencies:
-
apt-7.4.2 (from apt-7.4.0)
-
line-4.5.2 (from line-4.0.1)
-
openssh-2.10.3 (from openssh-2.9.1)
-
pyenv-3.5.1 (from pyenv-3.4.2)
-
selinux-6.0.4 (from selinux-3.1.1)
-
yum-7.4.0 (from yum-6.1.1)
-
yum-epel-4.5.0 (from yum-epel-4.1.2)
Bug fixes:
-
Fix the default behavior to skip the AWS ParallelCluster validation and test steps
when building a custom AMI.
-
Fix file handle leak in computemgtd .
-
Fix race condition that was sporadically causing launched instances to be
immediately terminated because they were not yet available in the EC2
DescribeInstances response.
-
Fix support for the DisableSimultaneousMultithreading parameter on
instance types with Arm processors.
-
Fix AWS ParallelCluster API stack update failure when upgrading from a previous
version. Add resource pattern used for the ListImagePipelineImages
Action in the EcrImageDeletionLambdaRole .
-
Fix AWS ParallelCluster API adding missing permissions needed to import or export
from Amazon S3 when creating an FSx for Lustre file system.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | July 27, 2022 |
AWS ParallelCluster documentation-only updates this year to date | AWS ParallelCluster documentation-only updates.
| July 6, 2022 |
AWS ParallelCluster version 3.1.4 released | AWS ParallelCluster version 3.1.4 released.
Changes:
-
Upgrade Slurm to version 21.08.8-2.
-
Build Slurm with JWT support.
-
Doesn't require PlacementGroup / Enabled to be set to true when passing an
existing PlacementGroup / Id .
-
Add lambda:TagResource to ParallelClusterUserRole used
by ParallelCluster API stack for cluster creation and image creation.
Bug fixes:
-
Fix the ability to export a cluster's logs when using the
export-cluster-logs command with the --filters
option.
-
Fix AWS Batch Docker entry point to use /home shared directory to
coordinate Multi-node-Parallel job execution.
-
Reset node address when setting slurm unhealthy static node to down to avoid
treating static node failed with insufficient capacity as a bootstrap failure
node.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | May 16, 2022 |
AWS ParallelCluster version 3.1.3 released | AWS ParallelCluster version 3.1.3 released.
Enhancements:
-
Execute SSH key creation alongside with the creation of HOME directory, for
example, during SSH login, when switching to another user and when executing a
command as another user.
-
Add support for both FQDN and LDAP Distinguished Names in the configuration
parameter DirectoryService /
DomainName.
The new validator now checks both the syntaxes.
-
New update_directory_service_password.sh script deployed on the
head node supports the manual update of the Active Directory password in the SSSD
configuration. The password is retrieved by the AWS Secrets Manager as from the cluster
configuration.
-
Add support to deploy API infrastructure in environments without a default
VPC.
Changes:
-
Disable deeper C-States in x86_64 official AMIs and AMIs created through
build-image command, to guarantee high performance and low
latency.
-
OS package updates and security fixes.
-
Change Amazon Linux 2 base images to use AMIs with Kernel 5.10.
Bug fixes:
-
Fix build-image stack in DELETE_FAILED after image built
successful, due to new EC2 Image Builder policies.
-
Fix the configuration parameter DirectoryService / DomainAddr
conversion to ldap_uri SSSD property when it contains multiples domain
addresses.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, and aws-parallelcluster-cookbook packages on GitHub. | April 20, 2022 |
AWS ParallelCluster version 3.1.2 released | AWS ParallelCluster version 3.1.2 released.
Bug fixes:
-
Fix the update of /etc/hosts file on compute nodes when a cluster
is deployed in subnets without internet access.
-
Fix compute nodes bootstrap to wait for ephemeral drives initialization before
joining the cluster.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster package on GitHub. | March 2, 2022 |
AWS ParallelCluster version 3.1.1 released | AWS ParallelCluster version 3.1.1 released.
-
Add support for multiple user cluster environments by integrating with
Active Directory (AD) domains managed through AWS Directory Service.
-
Add support for UseEc2Hostnames in
the cluster configuration file. When set to true, use EC2 default hostnames (e.g.
ip-1-2-3-4) for compute nodes.
-
Add support for cluster creation in subnets with no internet access.
-
Add support for multiple compute instance types per queue.
-
Add support for GPU scheduling with Slurm on ARM instances with NVIDIA
cards.
-
Add abbreviated flags for cluster-name (-n ),
region (-r ), image-id (-i ) and
cluster-configuration / image-configuration
(-c ) to the AWS ParallelCluster CLI.
-
Add support for NEW_CHANGED_DELETED option for FSx for Lustre AutoImportPolicy parameter.
-
Add parallelcluster:compute-resource-name tag to EC2
LaunchTemplates resources used by compute nodes.
-
Improve security groups created within the cluster to allow inbound connections
from custom security groups when SecurityGroups parameters are
specified for some head node and/or queues.
-
Install NVIDIA drivers and CUDA library for ARM.
Changes:
-
Upgrade Slurm to version 21.08.5 (from
20.11.8 ).
-
Upgrade Slurm plugin to version 21.08 (from
20.11 ).
-
Upgrade NICE DCV to version 2021.3-11591 (from
2021.1-10851 ).
-
Upgrade NVIDIA driver to version 470.103.01 (from
470.57.02 ).
-
Upgrade NVIDIA Fabric manager to version 470.103.01 (from
470.57.02 ).
-
Upgrade CUDA to version 11.4.4 (from 11.4.0 ).
-
Intel MPI updated to Version 2021 Update 4
(updated from Version 2019 Update 8). For more information, see IntelĀ® MPI Library 2021 Update 4.
-
Upgrade PMIx to version 3.2.3 (from 3.1.5 ).
-
Remove dumping of failed compute nodes to /home/logs/compute .
Compute nodes log files are available in CloudWatch and in EC2 console logs.
-
Enable potential to suppress SlurmQueues and
ComputeResources length validators.
-
Disable package update at instance launch time on Amazon Linux 2.
-
Disable EC2 ImageBuilder enhanced image metadata when building AWS ParallelCluster
custom images.
-
Explicitly set cloud-init datasource to be EC2. This saves boot
time for Ubuntu and CentOS platforms.
-
Use compute resource name rather than instance type in compute fleet launch
template name.
-
Redirect stderr and stdout to CLI log file to prevent unwanted text in the
pcluster CLI output.
-
Move the configure/install recipes to separate cookbooks that are called from
the main one. Existing entrypoints are maintained and backwards compatible.
-
Download dependencies of Intel HPC platform during AMI build time to avoid
contacting internet during cluster creation time.
-
Do not strip - from compute resource name when configuring Slurm
nodes.
-
Do not configure GPUs in Slurm when NVIDIA driver is not installed.
-
Fix ecs:ListContainerInstances permission in
BatchUserRole .
-
Fix exporting of cluster logs when there is no prefix specified, previously
exported to a None prefix.
-
Fix rollback not being performed in case of cluster update failure.
-
Fix ecs:ListContainerInstances permission in
BatchUserRole .
-
Fix RootVolume schema for the HeadNode by raising an
error if an unsupported KmsKeyId is specified.
-
Fix Amazon FSx missing metrics to be displayed in CloudWatch Dashboard.
-
Fix EfaSecurityGroupValidator . Previously, it had potential to
produce false failures when custom security groups were provided and EFA was
enabled.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook, and aws-parallelcluster-node packages on GitHub. | February 10, 2022 |
AWS ParallelCluster version 3.0.3 released | AWS ParallelCluster version 3.0.3 released.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster and aws-parallelcluster-cookbook packages on GitHub. | January 17, 2022 |
AWS ParallelCluster version 3.0.2 released | AWS ParallelCluster version 3.0.2 released.
Upgrade Elastic Fabric Adapter installer to 1.14.1
-
EFA config: efa-config-1.9-1 (from
efa-config-1.9 )
-
EFA profile: efa-profile-1.5-1 (from
efa-profile-1.5 )
-
EFA Kernel module: efa-1.14.2 (from efa-1.13.0 )
-
RDMA core: rdma-core-37.0 (from rdma-core-35 )
-
Libfabric: libfabric-1.13.2 (from
libfabric-1.13.0 )
-
Open MPI: openmpi40-aws-4.1.1-2 (no change)
GPUDirect RDMA is always enabled if supported by the instance type. The GdrSupport configuration option has no effect. For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook and aws-parallelcluster-node packages on GitHub. | November 5, 2021 |
AWS ParallelCluster version 3.0.1 released | AWS ParallelCluster version 3.0.1 released.
Cluster configuration migration tool
Default AWS Region read from ~/.aws/config file
-
For the pcluster command, if
the AWS Region is not specified in the configuration file, in the environment, or
on the command line, the default AWS Region specified in the region
setting in the [default] section of the
~/.aws/config file is used.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook and aws-parallelcluster-node packages on GitHub. | October 27, 2021 |
AWS ParallelCluster version 3.0.0 released | AWS ParallelCluster version 3.0.0 released.
Support for cluster management via Amazon API Gateway
-
Customers can now manage and deploy clusters through HTTP endpoints with
Amazon API Gateway. This opens up new possibilities for scripted or event-driven
workflows.
The AWS ParallelCluster command line interface (CLI) has also been redesigned for
compatibility with this API and includes a new JSON output option. This new
functionality makes it possible for customers to implement similar building block
capabilities using the CLI as well.
Improved custom AMI creation
-
Customers now have access to a more robust process for creating and managing
custom AMIs using EC2 Image Builder. Custom AMIs can now be managed through a separate
AWS ParallelCluster configuration file, and can be created using the pcluster build-image
command in the AWS ParallelCluster command line interface.
For details of the changes, see the CHANGELOG files for the
aws-parallelcluster, aws-parallelcluster-cookbook and aws-parallelcluster-node packages on GitHub. | September 10, 2021 |