Batch-Based Architecture - High Performance Computing Lens

Batch-Based Architecture

AWS Batch is a fully managed service that enables you to run large-scale compute workloads in the cloud without provisioning resources or managing schedulers. AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (for example, CPU or memory-optimized instances) based on the volume and specified resource requirements of the batch jobs submitted. It plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances. Without the need to install and manage the batch computing software or server clusters necessary for running your jobs, you can focus on analyzing results and gaining new insights.

With AWS Batch, you package your application in a container, specify your job’s dependencies, and submit your batch jobs using the AWS Management Console, the CLI, or an SDK. You can specify execution parameters and job dependencies and integrate with a broad range of popular batch computing workflow engines and languages (for example, Pegasus WMS, Luigi, and AWS Step Functions). AWS Batch provides default job queues and compute environment definitions that enable you to get started quickly.

An AWS Batch based architecture can be used for both loosely and tightly coupled workloads. Tightly coupled workloads should use Multi-node Parallel Jobs in AWS Batch.

Reference Architecture

Figure 2: Example AWS Batch architecture

Workflow steps:

  1. User creates a job container, uploads the container to the Amazon Elastic Container Registry or another container registry (for example, DockerHub), and creates a job definition to AWS Batch.

  2. User submits jobs to a job queue in AWS Batch.

  3. AWS Batch pulls the image from the container registry and processes the jobs in the queue

  4. Input and output data from each job is stored in an S3 bucket.