Scenario 3: Spot Instance running multi-node jobs is interrupted

The job fails with a state code of NODE_FAIL, and the job is requeued (unless --no-requeue was specified when the job was submitted). If the node is a static node, it's replaced. If the node is a dynamic node, the node is terminated and reset. Other nodes that were running the terminated jobs might be allocated to other pending jobs, or scaled down after the configured SlurmSettings / ScaledownIdletime time has passed.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Scenario 2: Spot Instance running single node jobs is interrupted

Schedulers supported by AWS ParallelCluster