Capacity Options Limits & Quotas Cost Optimization

Capacity, Limits, and Cost Optimization

Amazon Bedrock offers flexible capacity options to match your workload requirements and budget. Understanding the differences between on-demand tiers (Flex, Priority, Standard), reserved tier, batch processing, and cross-region inference helps you optimize both performance and cost.

Capacity Options

Capacity Type	Use Case	Key Characteristics
On-Demand: Flex	Sporadic, low-volume workloads	Lowest cost per token Best-effort availability May experience throttling No SLA
On-Demand: Standard	Regular production workloads	Balanced cost and performance Moderate throughput guarantees Standard SLA Most common choice
On-Demand: Priority	High-priority, latency-sensitive apps	Highest on-demand cost Premium throughput allocation Enhanced SLA Reduced throttling risk
Reserved Tier	Consistent, high-volume workloads	Reserved model units Guaranteed capacity 1 or 6 month commitments Predictable performance
Batch	Large-scale, non-time-sensitive processing	50% cost savings vs on-demand 24-hour processing window Ideal for bulk inference
Cross-Region Inference	High availability, traffic bursting	Automatic failover Route to less-busy regions Improved uptime Uses on-demand pricing

Limits & Quotas

On-Demand Limits (by tier)

Tier	RPM Range	TPM Range	Throttling Risk
Flex	10-100	5K-50K	High
Standard	100-500	50K-150K	Medium
Priority	500-1000+	150K-300K+	Low

Burst capacity: Available across all tiers for short spikes
Soft limits: Increasable via service quota requests
Model-specific: Actual limits vary by foundation model

Reserved Tier Limits

Minimum commitment: 1 model unit
Maximum units: Account and region-specific
Input/output token limits: Based on purchased units
No RPM throttling within purchased capacity

Batch Processing Limits

Job size: Up to 10,000 records per batch
File size: Maximum 200 MB input file
Processing time: 24-hour completion window
Concurrent jobs: Region-specific quotas

Cross-Region Inference

Inherits on-demand tier limits per region
No additional quota overhead
Automatic routing (no manual limit management)

Cost Optimization

Decision Framework

Scenario	Recommended Option	Why
Development/testing	Flex	Lowest cost, acceptable for non-production
Standard production	Standard	Best cost-performance balance
Critical user-facing apps	Priority	Reliability and performance over cost
Steady high-volume load	Reserved Tier	30-50% savings with commitment
Bulk data processing	Batch	50% discount, non-urgent workloads
Mission-critical uptime	Cross-Region Inference	Availability > cost

Optimization Strategies

Choose the Right On-Demand Tier

Start with Standard for most workloads
Downgrade to Flex for dev/test environments
Upgrade to Priority only when throttling impacts users
Monitor CloudWatch throttle metrics to inform decisions

Transition to Reserved Tier

When consistent load exceeds 40% of on-demand costs
Calculate break-even: (Monthly on-demand cost) vs (Reserved commitment)
Use 1-month commitment initially
Reserved tier can work alongside any on-demand tier

Leverage Batch for

Training data generation
Content moderation backlogs
Report generation
Data enrichment pipelines

Combine Approaches

Reserved tier for baseline traffic
Standard on-demand for moderate bursts
Priority on-demand for critical peak periods
Batch for offline processing
Cross-region for failover only

Cost Monitoring

Compare tier costs: Flex < Standard < Priority
Track tokens per request (optimize prompts)
Use CloudWatch metrics for utilization and throttling
Set billing alarms for unexpected spikes
Review reserved tier utilization monthly
Evaluate tier upgrades only when throttling occurs

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Managing Projects with AWS CloudFormation

Reserved, Standard, Priority, and Flex tiers