Solution Components - Scale-Out Computing on AWS

Solution Components

User Interface

Scale-Out Computing on AWS deploys and sets up an example web user interface (UI) with a common set of APIs that the administrator and users can use to interact with their Amazon Elastic Compute Cloud (Amazon EC2) cluster. The example UI allows users to submit jobs, manage and share their files, start/stop desktop cloud visualization (DCV) sessions, download private keys, and monitor the queue and job status in real-time. Administrators can use the UI to manage LDAP users and groups, create application profiles (for web-based job submission), and manage job queues.

Pre- and Post-Processing in the Cloud

This solution leverages cloud-based workstations to enable users to easily access the cluster to perform any pre- and post-processing visualization actions (such as computer-aided design). User working files persist across workstation sessions and are stored in the user home directory in Amazon Elastic File System (Amazon EFS). Administrators can create custom Linux Amazon Machine Images (AMIs) with common user applications preinstalled in the cloud workstation.

Real-Time Analytics

Schedulers and application logs are ingested in real-time and stored into the data lake for further processing. Node counts, job status, and metadata is automatically pushed to the Amazon Elasticsearch Service (Amazon ES) cluster.

Custom Code and Automation

This solution is deployed with a collection of scripts that are customizable and can be extended to help administrators and users collect data and execute common cluster tasks. These customizations can be found in /apps/soca/<your-soca-name> and perform the following tasks:

  • Automatic Error Handling: Dry run checks before provisioning Amazon EC2 capacity

  • Automatic Log Management: Collects and backups cluster logs to Amazon S3

  • Custom job status tool: Improves cluster status with AWS-specific information

  • Simplified LDAP user management: Scripts to perform typical LDAP actions

  • Application License resource: FlexLM software enabled script which calculates the number of license available for a given feature

High-Perfomance Computing (HPC) Budgets

This solution helps users and administrators easily manage their HPC budgets. It generates detailed reports by users, software, teams, queues, projects, or applications using resource tagging. This solution uses AWS Cost Explorer and AWS Budgets to help users manage their expenses and forecast their budgets based on historical data. Note that if resource tagging is not enabled, you must manually enable these tags for the Cost Explorer reporting platform through Cost Allocation Tags.

Customizable

This solution can be customized by users to fit their business needs. The business logic is configured using an AWS CloudFormation template and Amazon EC2 user data scripts. The solution's codebase is open-source and available on GitHub. Customization examples can be found on the official documentation.

Persistent and Unlimited Storage

This solution deploys two unlimited Amazon Elastic File System (Amazon EFS) storage files (/apps and /data). You can also deploy high-speed Amazon EBS SSD-backed disks or Amazon FSx for Lustre that can be used as a scratch location on your compute nodes.

Centralized User Mangement

Customers can create unlimited LDAP users and groups. By default, this solution deploys a default LDAP account and a Sudoers LDAP group which manages the SUDO permission on the cluster.

Scheduler Instance

This solution deploys an Amazon EC2 instance running the open source PBS Professional (PBSPRO) 18.1.4 job scheduling software. This solution has an AGPLv3 licensing component. For more information, see Notices.

Application Programming Interface (API)

This solution provides an HTTP REST API for administrators and users to interact with the cluster programmatically. Through the API you can create users, groups, and queues; submit jobs; and view and change job status using either bash or python.