AWS PCS scheduler logs - AWS PCS

AWS PCS scheduler logs

You can configure AWS PCS to send detailed logging data from your cluster scheduler to Amazon CloudWatch Logs, Amazon Simple Storage Service (Amazon S3), and Amazon Data Firehose. This can assist with monitoring and troubleshooting. You can set up AWS PCS scheduler logs using the AWS PCS console, as well as programmatically using the AWS CLI or SDK.

Prerequisites

The IAM principal used to manage the AWS PCS cluster must allow pcs:AllowVendedLogDeliveryForResource. Here is a sample AWS IAM policy that enables it.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "PcsAllowVendedLogsDelivery", "Effect": "Allow", "Action": ["pcs:AllowVendedLogDeliveryForResource"], "Resource": [ "arn:aws:pcs:::cluster/*" ] } ] }

Setting up scheduler logs using the AWS PCS console

To set up AWS PCS scheduler logs in the console, follow these steps:

  1. Open the AWS PCS console.

  2. Choose Clusters and navigate to the detail page for the AWS PCS cluster where you will enable logging.

  3. Choose Logs.

  4. Under log deliveries – Scheduler Logsoptional

    1. Add up to three log delivery destinations. Choices include CloudWatch Logs, Amazon S3, or Firehose.

    2. Choose Update log deliveries.

You can reconfigure, add, or remove log deliveries by revisiting this page.

Setting up scheduler logs using the AWS CLI

To accomplish this, you need at least one delivery destination, one delivery source (the PCS cluster), and one delivery, which is a relationship that connects a source to a destination.

Create a delivery destination

You need at least one delivery destination to receive scheduler logs from an AWS PCS cluster. You can learn more about this topic in the PutDeliveryDestination section of the CloudWatch API User Guide.

To create a delivery destination using the AWS CLI
  • Create a destination with the command that follows. Before running the command, make the following replacements:

    • Replace region-code with the AWS Region where you will create your destination. This will generally be the same region as where the AWS PCS cluster is deployed.

    • Replace pcs-logs-destination with your preferred name. It must be unique for all delivery destinations in your account.

    • Replace resource-arn with the ARN for an existing log group in CloudWatch Logs, an S3 bucket, or a delivery stream in Firehose. Examples include:

      • CloudWatch Logs group

        arn:aws:logs:region-code:account-id:log-group:/log-group-name:*
      • S3 bucket

        arn:aws:s3:::bucket-name
      • Firehose delivery stream

        arn:aws:firehose:region-code:account-id:deliverystream/stream-name
aws logs put-delivery-destination --region region-code \ --name pcs-logs-destination \ --delivery-destination-configuration destinationResourceArn=resource-arn

Take note of the ARN for the new delivery destination, since you will need it to configure deliveries.

Enable the AWS PCS cluster as a delivery source

To collect scheduler logs from AWS PCS, configure the cluster as a delivery source. For more information, see PutDeliverySource in the Amazon CloudWatch Logs API Reference.

To configure a cluster as a delivery source using the AWS CLI
  • Enable logs delivery from your cluster with the command that follows. Before running the command, make the following replacements:

    • Replace region-code with the AWS Region where your cluster is deployed.

    • Replace cluster-logs-source-name with a name for this source. It must be unique for all delivery sources in your AWS account. Consider incorporating the name or ID of the AWS PCS cluster.

    • Replace cluster-arn with the ARN for your AWS PCS cluster

aws logs put-delivery-source \ --region region-code \ --name cluster-logs-source-name \ --resource-arn cluster-arn \ --log-type PCS_SCHEDULER_LOGS

Connect the cluster delivery source to the delivery destination

For scheduler log data to flow from the cluster to the destination, you must configure a delivery that connects them. For more information, see CreateDelivery in the Amazon CloudWatch Logs API Reference.

To create a delivery using the AWS CLI
  • Create a delivery using the command that follows. Before running the command, make the following replacements:

    • Replace region-code with the AWS Region where your source and destination exist.

    • Replace cluster-logs-source-name with the name of your delivery source from above.

    • Replace destination-arn with the ARN from a delivery destination where you want logs to be delivered.

aws logs create-delivery \ --region region-code \ --delivery-source-name cluster-logs-source \ --delivery-destination-arn destination-arn

Scheduler log stream paths and names

The path and name for AWS PCS scheduler logs depend on the destination type.

  • CloudWatch Logs

    • A CloudWatch Logs stream follows this naming convention.

      AWSLogs/PCS/${cluster_id}/${log_name}_${scheduler_major_version}.log
      Example
      AWSLogs/PCS/abcdef0123/slurmctld_24.05.log
  • S3 bucket

    • An S3 bucket output path follows this naming convention:

      AWSLogs/${account-id}/PCS/${region}/${cluster_id}/${log_name}/${scheduler_major_version}/yyyy/MM/dd/HH/
      Example
      AWSLogs/111111111111/PCS/us-east-2/abcdef0123/slurmctld/24.05/2024/09/01/00.
    • An S3 object name follows this convention:

      PCS_${log_name}_${scheduler_major_version}_#{expr date 'event_timestamp', format: "yyyy-MM-dd-HH"}_${cluster_id}_${hash}.log
      Example
      PCS_slurmctld_24.05_2024-09-01-00_abcdef0123_0123abcdef.log

Example AWS PCS scheduler log record

AWS PCS scheduler logs are structured. They include fields such as the cluster identifier, scheduler type, major and patch versions, in addition to the log message emitted from the Slurm controller process. Here is an example.

{
    "resource_id": "s3431v9rx2",
    "resource_type": "PCS_CLUSTER",
    "event_timestamp": 1721230979,
    "log_level": "info",
    "log_name": "slurmctld",
    "scheduler_type": "slurm",
    "scheduler_major_version": "23.11",
    "scheduler_patch_version": "8",
    "node_type": "controller_primary",
    "message": "[2024-07-17T15:42:58.614+00:00] Running as primary controller\n"
}