Quotas
Service quotas, also referred to as limits, are the maximum number of service resources or operations for your AWS account.
Quotas for AWS services in this solution
Make sure you have sufficient quota for each of the services implemented in this solution. For more information, see AWS service quotas.
Select one of the following links to go to the page for that service. To view the service quotas for all AWS services in the documentation without switching pages, view the information in the Service endpoints and quotas page in the PDF instead.
AWS CloudFormation quotas
Your AWS account has AWS CloudFormation quotas that you should be aware of when launching the stack in this solution. By understanding these quotas, you can avoid limitation errors that would prevent you from deploying this solution successfully. For more information, see AWS CloudFormation quotas in the AWS CloudFormation User’s Guide.
AWS SageMaker endpoint quota
The provided LLM SageMaker API requires an ml.g5.12xlarge
SageMaker instance type,
which is not enabled in AWS accounts by default and must be requested on a per Region
basis. If you are planning on deploying the default LLM SageMaker API model then you must request
a quota increase before deploying the solution.
Sign in to the AWS Management Console, access AWS Service Quotas and search for Amazon SageMaker under the AWS services list. Once selected, search for the quota called ml.g5.12xlarge for endpoint usage. At a minimum, you must request a quota increase to one (you can request more to accommodate high-volume production deployments).
Note
The ml.g5.12xlarge
instance type is not available in the
ap-southeast-1
Region.
Amazon Lex quotas
Your AWS account has Amazon Lex quotas, which you can view by following these steps:
-
Sign in to the AWS Service Quotas console
. -
Choose AWS services from the left navigation menu.
-
Enter
Amazon Lex
in the Find services field. -
Choose Amazon Lex.
Amazon Lex V2 requires the fulfillment Lambda’s maximum output size to be set to 50 KB. You cannot adjust this setting through the AWS account’s Service endpoints and quotas. You might reach this quota when you are trying to return very large responses by increasing the number of words or context in the response. Additionally, when you use RAG with Amazon Kendra or Knowledge Bases for Amazon Bedrock, you might want to limit your output by customizing the settings such as prompt templates, max retrieved results, or documents.