Manage throughput quotas - Generative AI Lens

Manage throughput quotas

GENREL01: How do you determine throughput quotas (or needs) for foundation models?

Foundation models perform complex tasks over detailed input, and they have limited throughput on the amount of inference requests they can service at a time. This is particularly true for managed and serverless model hosting paradigms.