GENCOST02-BP02 Optimize resource consumption to minimize hosting costs
Hosting a foundation model for inference requires myriad choices, all of which affect cost. These cost dimensions can be optimized to reduce cost while meeting performance goals.
Desired outcome: When implemented, this best practice describes a relationship between cost and performance contextualized in self-hosted foundation model hosting.
Benefits of establishing this best practice:
-
Measure overall efficiency - It is helpful to understand inference and hosting costs associated with the performance requirements of foundation model.
-
Stop spending money on undifferentiated heavy lifting - More often than not, it is beneficial to opt for a managed or serverless hosting paradigm, due to the intractability of the total cost of ownership for foundation model hosting.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Self-hosted model infrastructure should be optimized based on the
model used and the workload's usage pattern. Customers
self-hosting models should also consider optimizing the model's
hosting infrastructure. Consider right-sizing the inference
endpoint to the smallest instance available that allows you to
meet performance goals. In some scenarios, it may be appropriate
to shut down the hosting instance and restart it during relevant
hours. This is particularly useful for workloads with predictable
usage patterns. You may also consider purchasing Amazon EC2 Reserved
Instances
Implementation steps
-
Identify the nature of the demand for this workload.
-
Deploy selected foundation model on acceptable infrastructure, even if it may be over-provisioned.
-
Establish an inference or demand profile for the hosted workload.
-
Optimize the hosting infrastructure in accordance with the workload's demands, and select the most cost optimized infrastructure that meets performance requirements.
Resources
Related practices:
Related guides, videos, and documentation:
Related examples: