Manage throughput quotas
GENREL01: How do you determine throughput quotas (or needs) for foundation models? |
---|
Foundation models perform complex tasks over detailed input, and they have limited throughput on the amount of inference requests they can service at a time. This is particularly true for managed and serverless model hosting paradigms.