Amazon Bedrock: Foundation models as a service Amazon SageMaker Serverless Inference: Custom model hosting Choosing between Amazon Bedrock and SageMaker Serverless Inference

Model execution strategies for AI workloads

At the core of any AI architecture is the model execution layer, the component that performs inference, powers predictions, or generates content. AWS offers two powerful, serverless-ready paths for executing AI workloads:

Amazon Bedrock provides access to foundation models (FMs) for generative AI use cases.
Amazon SageMaker Serverless Inference enables scalable deployment of custom-trained models for traditional machine learning (ML) workloads.

By understanding when and how to use each AWS service, enterprises can optimize for both business needs and operational efficiency.

Amazon Bedrock: Foundation models as a service

Amazon Bedrock is a fully managed service that provides serverless access to FMs from leading AI providers such as Anthropic (Claude), Meta (Llama), Mistral, Cohere, and Amazon Titan and Amazon Nova. You can interact with these models using simple API calls, without needing to provision infrastructure, manage GPUs, or fine-tune models.

Key capabilities of Amazon Bedrock include the following:

Text generation – Summarization, rewriting, content creation, and Q&A.
Code generation – Natural language to code.
Classification and extraction – Labeling, parsing, and semantic tagging.
RAG workflows – Integrate with knowledge bases for grounded responses.
Agents – Enable autonomous orchestration and tool use.
Multimodal intelligence – Through Amazon Nova, understand and generate across text, image, and video.
Fine-tuning and distillation support – Through Amazon Nova Premier, train task-specific models or create compact student models.
Tiered performance and cost – Select from Amazon Nova Micro, Nova Lite, Nova Pro, and Nova Premier models to balance latency, accuracy, and price.

Operational benefits of Amazon Bedrock include the following:

Model management – No model hosting or versioning required.
Secure data handling – Isolated tenant environment and no training on user data.
Token-based billing – Provides predictable cost modeling.
Multimodal API unification – Handles input/output across images, video, and text through the same Amazon Bedrock interface.
Low-latency options – Available with Amazon Nova Micro and Nova Lite that are ideal for edge and user-facing generative AI apps.
Enterprise grounding compatibility – All Amazon Nova models are compatible with Amazon Bedrock Knowledge Bases and Retrieval Augmented Generation (RAG) architectures.

Amazon Bedrock integrates with other AWS services and features in the following ways:

Triggered from Lambda, Step Functions, or API Gateway
Integrated with Amazon Bedrock Agents for goal-driven orchestration
Works seamlessly with Amazon Bedrock Knowledge Bases and RAG pipelines

Ideal use cases for Amazon Bedrock

Amazon Bedrock is well-suited for a variety of scenarios, such as the following:

Generative AI tasks - Create marketing content and documentation and power chatbots.
Conversational assistants - Build support bots and internal copilots.
Knowledge retrieval – Use for summarization and semantic search tasks.
Dynamic planning - Power agent-based decision systems.
Multimodal generation – Use Amazon Nova Canvas to generate images, and use Amazon Nova Reel to produce videos from prompts and structured context.
Enterprise assistants – Use Amazon Nova Pro to enable goal-driven decision-making tools that are grounded in proprietary data.
Real-time user experience feedback - Analyze and respond to customer actions with under 100ms latency by using Amazon Nova Micro.

Amazon SageMaker Serverless Inference: Custom model hosting

Amazon SageMaker Serverless Inference is designed for developers and data scientists who have trained their own models (for example, XGBoost, PyTorch, Scikit-learn, and TensorFlow). By using SageMaker Serverless Inference, they can deploy their models in a scalable, serverless environment.

Unlike Amazon Bedrock, SageMaker Serverless Inference gives you control over the model architecture, training data, and logic.

Key capabilities of SageMaker Serverless Inference include the following:

Hosts traditional ML models such as classification, regression, natural language processing (NLP), and forecasting
Supports multi-model endpoints
Supports automatic scaling so that compute is provisioned on-demand and shut down when idle
Runs inference on custom container images or prebuilt ML frameworks

Operational benefits of SageMaker Serverless Inference include the following:

Pay-per-inference model with zero idle costs
Fully managed endpoints and no server setup
Integrates with training pipelines and notebooks

SageMaker Serverless Inference integrates with other AWS services and features in the following ways:

Invoked by using AWS LambdaStep Functions, or SDK and API calls
Works with SageMaker Pipelines for end-to-end machine learning operations (MLOps)
Logs and metrics integrated with Amazon CloudWatch

Ideal use cases for SageMaker Serverless Inference

SageMaker Serverless Inference is a good choice for various machine learning applications:

Predictive analytics - Use for sales forecasting and churn prediction models.
Text classification - Supports tasks like spam detection and sentiment analysis.
Image classification - Enables document optical character recognition (OCR) and medical imaging applications.
Custom natural language processing (NLP) - Handles entity recognition and document tagging tasks.

Choosing between Amazon Bedrock and SageMaker Serverless Inference

Both Amazon Bedrock and SageMaker Serverless Inference offer serverless paths to scalable, production-ready AI execution. Together, they form the core execution layer of modern, event-driven, serverless AI architectures on AWS. The following table compares these services across key dimensions.

Dimension	Amazon Bedrock	SageMaker Serverless Inference
Model type	Foundation models (LLMs)	Custom-trained ML models
Setup effort	Minimal (no training or hosting)	Requires model training and packaging
Use case	Generative, conversational, and semantic	Predictive, numerical, and structured data
Scalability	Fully serverless and auto-scaled	Fully serverless and auto-scaled
Cost model	Pay per token	Pay per inference
Integration	API Gateway, Lambda, Amazon Bedrock Agents, and RAG	Lambda, Step Functions, and CI/CD pipelines
Tuning required	None (zero-shot or few-shot)	Full control (hyperparameters and retraining)

Choosing the right service depends on the nature of your AI workload:

Use Amazon Bedrock when you need semantic flexibility, goal-driven workflows, and rapid iteration with foundation models.
Use SageMaker Serverless Inference when you have proprietary models, structured inputs, or need full control over training and deployment.
Use SageMaker JumpStart to choose from hundreds of built-in algorithms with pretrained models from model hubs, including TensorFlow Hub, PyTorch Hub, Hugging Face, and MxNet GluonCV.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Orchestration models: From rule-based to AI-native

Grounding and Retrieval Augmented Generation