Hybrid Deployment - High Performance Computing Lens

Hybrid Deployment

Hybrid deployments are primarily considered by organizations that are invested in their on-premises infrastructure and also want to use AWS. This approach allows organizations to augment on-premises resources and creates an alternative path to AWS rather than an immediate full migration.

Hybrid scenarios vary from minimal coordination, like workload separation, to tightly integrated approaches, like scheduler driven job placement. For example, an organization may separate their workloads and run all workloads of a certain type on AWS infrastructure. Alternatively, organizations with a large investment in their on-premises processes and infrastructure may desire a more seamless experience for their end users by managing AWS resources with their job scheduling software and potentially a job submission portal. Several job schedulers – commercial and open-source – provide the capability to dynamically provision and deprovision AWS resources as necessary. The underlying resource management relies on native AWS integrations (for example, AWS CLI or API) and can allow for a highly customized environment, depending on the scheduler. Although job schedulers help manage AWS resources, the scheduler is only one aspect of a successful deployment.

Critical factors in successfully operating a hybrid scenario are data locality and data movement. Some HPC workloads do not require or generate significant datasets; therefore, data management is less of a concern. However, jobs that require large input data, or that generate significant output data, can become a bottleneck. Techniques to address data management vary depending on organization. For example, one organization may have their end users manage the data transfer in their job submission scripts, others might only run certain jobs in the location where a dataset resides, a third organization might choose to duplicate data in both locations, and yet another organization might choose to use a combination of several options.

Depending on the data management approach, AWS provides several services to aid in a hybrid deployment. For example, AWS Direct Connect establishes a dedicated network connection between an on-premises environment and AWS, and AWS DataSync automatically moves data from on-premises storage to Amazon S3 or Amazon Elastic File System. Additional software options are available from third-party companies in the AWS Marketplace and the AWS Partner Network (APN).

Hybrid-deployment architectures can be used for loosely and tightly coupled workloads. However, a single tightly coupled workload should reside either on-premises or in AWS for best performance.

Reference Architecture

Diagram showing on-premises data center connected to AWS Cloud VPC with various compute options.

Figure 3: Example hybrid, scheduler-based deployment

Workflow steps:

  1. User submits the job to a scheduler (for example, Slurm) on an on-premises login node.

  2. Scheduler executes the job on either on-premises compute or AWS infrastructure based on configuration.

  3. The jobs access shared storage based on their run location.