本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用地形建立叢集
使用時 AWS ParallelCluster,您只需為建立或更新 AWS ParallelCluster 映像和叢集時所建立的 AWS 資源付費。如需詳細資訊,請參閱 AWS 使用的服務 AWS ParallelCluster。
先決條件
-
已安裝地形 v1.5.7 +。
-
AWS ParallelCluster APIv3.8.0+ 已部署在您的帳戶中。請參閱使用地形部署 ParallelCluster API。
-
具有呼叫 ParallelCluster API 之許可的 IAM 角色。請參閱 [所需權限]
定義地形專案
在本教學課程中,您將定義一個簡單的 Terraform 專案來部署叢集。
創建一個名為的目錄
my-clusters
。您建立的所有檔案都會位於此目錄中。
建立
terraform.tf
要匯入 ParallelCluster 提供者的檔案。terraform { required_version = ">= 1.5.7" required_providers { aws-parallelcluster = { source = "aws-tf/aws-parallelcluster" version = "1.0.0" } } }
建立檔案
providers.tf
以設定 ParallelCluster 和 AWS 提供者。provider "aws" { region = var.region profile = var.profile } provider "aws-parallelcluster" { region = var.region profile = var.profile api_stack_name = var.api_stack_name use_user_role = true }
使用 ParallelCluster模組建立檔案
main.tf
以定義資源。module "pcluster" { source = "aws-tf/parallelcluster/aws" version = "1.0.0" region = var.region api_stack_name = var.api_stack_name api_version = var.api_version deploy_pcluster_api = false template_vars = local.config_vars cluster_configs = local.cluster_configs config_path = "config/clusters.yaml" }
建立檔案
clusters.tf
以將多個叢集定義為 Terraform 局部變數。注意
您可以在
cluster_config
元素中定義多個叢集。對於每個叢集,您可以在區域變數中明確定義叢集內容 (請參閱DemoCluster01
) 或參考外部檔案 (請參閱DemoCluster02
)。若要檢閱可在組態元素中設定的叢集內容,請參閱叢集配置檔案。
若要檢閱可針對叢集建立設定的選項,請參閱pcluster create-cluster。
locals { cluster_configs = { DemoCluster01 : { region : local.config_vars.region rollbackOnFailure : false validationFailureLevel : "WARNING" suppressValidators : [ "type:KeyPairValidator" ] configuration : { Region : local.config_vars.region Image : { Os : "alinux2" } HeadNode : { InstanceType : "t3.small" Networking : { SubnetId : local.config_vars.subnet } Iam : { AdditionalIamPolicies : [ { Policy : "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" } ] } } Scheduling : { Scheduler : "slurm" SlurmQueues : [{ Name : "queue1" CapacityType : "ONDEMAND" Networking : { SubnetIds : [local.config_vars.subnet] } Iam : { AdditionalIamPolicies : [ { Policy : "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore" } ] } ComputeResources : [{ Name : "compute" InstanceType : "t3.small" MinCount : "1" MaxCount : "4" }] }] SlurmSettings : { QueueUpdateStrategy : "TERMINATE" } } } } DemoCluster02 : { configuration : "config/cluster_config.yaml" } } }
建立檔案
config/clusters.yaml
以將多個叢集定義為 YAML 組態。DemoCluster03: region: ${region} rollbackOnFailure: true validationFailureLevel: WARNING suppressValidators: - type:KeyPairValidator configuration: config/cluster_config.yaml DemoCluster04: region: ${region} rollbackOnFailure: false configuration: config/cluster_config.yaml
創建文件
config/cluster_config.yaml
,這是一個標準的 ParallelCluster 配置文件,可以在其中注入 Terraform 變量。若要檢閱可在組態元素中設定的叢集內容,請參閱叢集配置檔案。
Region: ${region} Image: Os: alinux2 HeadNode: InstanceType: t3.small Networking: SubnetId: ${subnet} Iam: AdditionalIamPolicies: - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore Scheduling: Scheduler: slurm SlurmQueues: - Name: queue1 CapacityType: ONDEMAND Networking: SubnetIds: - ${subnet} Iam: AdditionalIamPolicies: - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore ComputeResources: - Name: compute InstanceType: t3.small MinCount: 1 MaxCount: 5 SlurmSettings: QueueUpdateStrategy: TERMINATE
建立檔案
clusters_vars.tf
以定義可以插入叢集配置的變數。此檔案可讓您定義可用於叢集配置的動態值,例如地區和子網路。
此範例會直接從專案變數擷取值,但您可能需要使用自訂邏輯來判斷這些值。
locals { config_vars = { subnet = var.subnet_id region = var.cluster_region } }
創建文件
variables.tf
以定義可以為此項目注入的變量。variable "region" { description = "The region the ParallelCluster API is deployed in." type = string default = "us-east-1" } variable "cluster_region" { description = "The region the clusters will be deployed in." type = string default = "us-east-1" } variable "profile" { type = string description = "The AWS profile used to deploy the clusters." default = null } variable "subnet_id" { type = string description = "The id of the subnet to be used for the ParallelCluster instances." } variable "api_stack_name" { type = string description = "The name of the CloudFormation stack used to deploy the ParallelCluster API." default = "ParallelCluster" } variable "api_version" { type = string description = "The version of the ParallelCluster API." }
建立檔案
terraform.tfvars
以設定變數的任意值。下列檔案會使用現有 ParallelCluster API 3.10.0 (已使用堆疊名稱部署在中) 在
us-east-1
子網路eu-west-1
subnet-123456789
內部署叢集。MyParallelClusterAPI-310
region = "us-east-1" api_stack_name = "MyParallelClusterAPI-310" api_version = "3.10.0" cluster_region = "eu-west-1" subnet_id = "subnet-123456789"
建立檔案
outputs.tf
以定義此專案傳回的輸出。output "clusters" { value = module.pcluster.clusters }
項目目錄是:
my-clusters ├── config │ ├── cluster_config.yaml - Cluster configuration, where terraform variables can be injected.. │ └── clusters.yaml - File listing all the clusters to deploy. ├── clusters.tf - Clusters defined as Terraform local variables. ├── clusters_vars.tf - Variables that can be injected into cluster configurations. ├── main.tf - Terraform entrypoint where the ParallelCluster module is configured. ├── outputs.tf - Defines the cluster as a Terraform output. ├── providers.tf - Configures the providers: ParallelCluster and AWS. ├── terraform.tf - Import the ParallelCluster provider. ├── terraform.tfvars - Defines values for variables, e.g. region, PCAPI stack name. └── variables.tf - Defines the variables, e.g. region, PCAPI stack name.
部署叢集
若要部署叢集,請依序執行標準的 Terraform 指令。
注意
此範例假設您已在帳戶中部署 ParallelCluster API。
建立專案:
terraform init
定義部署計劃:
terraform plan -out tfplan
部署計劃:
terraform apply tfplan
使用叢集部署 ParallelCluster API
如果您尚未部署 ParallelCluster API,並且想要將其與叢集一起部署,請變更下列檔案:
main.tf
module "pcluster" { source = "aws-tf/aws/parallelcluster" version = "1.0.0" region = var.region api_stack_name = var.api_stack_name api_version = var.api_version deploy_pcluster_api = true template_vars = local.config_vars cluster_configs = local.cluster_configs config_path = "config/clusters.yaml" }
providers.tf
provider "aws-parallelcluster" { region = var.region profile = var.profile endpoint = module.pcluster.pcluster_api_stack_outputs.ParallelClusterApiInvokeUrl role_arn = module.pcluster.pcluster_api_stack_outputs.ParallelClusterApiUserRole }
所需的許可
您需要下列權限才能使用 Terraform 部署叢集:
-
假設 ParallelCluster API 角色,負責與 ParallelCluster API 進行交互
-
描述 ParallelCluster API 的 AWS CloudFormation 堆棧以驗證它是否存在並檢索其參數和輸出
{ "Version": "2012-10-17", "Statement": [ { "Action": "sts:AssumeRole", "Resource": "arn:
PARTITION
:iam::ACCOUNT
:role/PCAPIUserRole-*", "Effect": "Allow", "Sid": "AssumePCAPIUserRole" }, { "Action": [ "cloudformation:DescribeStacks" ], "Resource": "arn:PARTITION
:cloudformation:REGION
:ACCOUNT
:stack/*", "Effect": "Allow", "Sid": "CloudFormation" } ] }