Intended audience Objectives About this content series The business case of serverless AI AWS services powering serverless AI

Building serverless architectures for agentic AI on AWS

Aaron Sempf, Amazon Web Services

July 2025 (document history)

The convergence of AI and serverless computing is reshaping the landscape of modern enterprise architecture. In response, organizations are striving to deliver intelligent capabilities at scale. They face increasing pressure to reduce operational overhead, accelerate innovation, and deploy applications that can adapt in real time to user behavior and system events.

Serverless AI on AWS represents a fundamental shift toward intelligent, adaptive, cloud-native systems. With the right strategy and tooling, organizations can unlock faster innovation cycles, lower costs, and greater scalability. This approach positions them at the forefront of the next generation of enterprise computing. AWS is enabling this shift through a combination of fully managed AI services and event-driven, serverless infrastructure.

This guide outlines the strategic and technical foundations for building AI-native, serverless architectures on AWS. These architectures are scalable, cost-effective, and capable of delivering real-time intelligence without the complexity of managing infrastructure.

Intended audience

This guide is for architects, developers, and technology leaders seeking to harness the power of AI-driven software agents within modern cloud-native applications.

Objectives

This guide helps you do the following:

Understand the AWS native services available for agentic AI solution development
Operationalize agentic AI with cloud-scale reliability
Align AI execution with business outcomes and cost models
Establish a framework for secure, governed AI adoption

About this content series

This guide is part of a set of publications that provide architectural blueprints and technical guidance for building AI-driven software agents on AWS. The AWS Prescriptive Guidance series includes the following:

Operationalizing agentic AI on AWS
Foundations of agentic AI on AWS
Agentic AI patterns and workflows on AWS
Agentic AI frameworks, protocols, and tools on AWS
Building serverless architectures for agentic AI on AWS (this guide)
Building multi-tenant architectures for agentic AI on AWS

For more information about this content series, see Agentic AI.

The business case of serverless AI

Serverless computing provides an ideal foundation for modern AI workloads. AI applications often require intermittent, compute-intensive inference, especially in use cases such as fraud detection, recommendation engines, document summarization, and customer service automation. Traditional infrastructure models can be expensive and operationally complex when managing unpredictable or spiky workloads.

In contrast, serverless architectures offer significant advantages. They scale automatically, execute on-demand, reduce operational overhead, and charge only for resources used. These features make serverless architectures well-suited for embedding AI into modern cloud-native applications. AWS offers a comprehensive portfolio of services that combine serverless and AI capabilities. These services include Amazon SageMaker Serverless Inference and Amazon Bedrock, which provides access to foundation models through a fully managed, API-based interface. Additionally, AWS Lambda and AWS Step Functions enable the development of agile, cost-aligned, and production-ready AI systems. When paired with services like Amazon Bedrock and SageMaker Serverless Inference, these tools offer powerful support for AI workloads.

AI workloads, particularly inference, are often unpredictable and bursty. In traditional architectures, this leads to overprovisioned infrastructure, increased costs, and complexity in scaling. Serverless models solve these issues by offering:

Elastic scalability – Resources scale automatically based on demand.
Cost optimization – No charges for idle compute. Pay only for execution time.
Reduced operational overhead – Fewer operations, less to manage, and fewer dependencies on other technology, processes, or resources.
Faster time to market – Developers can focus on business logic and model performance instead of managing servers.
High availability and built-in resilience – AWS serverless offerings provide these capabilities by default.

These capabilities make serverless a natural fit for deploying AI models across a wide variety of use cases, from fraud detection and personalized recommendations to document analysis and conversational AI.

AWS services powering serverless AI

AWS provides a robust suite of managed services that help teams embed intelligence into applications, orchestrate workflows, and react to events without managing infrastructure:

With AWS Lambda, you can run event-driven compute workloads at scale without provisioning servers. It's ideal for AI pre- and post-processing and lightweight inference logic.
Use Amazon SageMaker Serverless Inference to deploy machine learning (ML) models for real-time predictions with automatic scaling and no idle charges.
Amazon Bedrock provides access to foundation models from leading AI companies like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma AI, Meta, Mistral AI, poolside (coming soon), Stability AI, TwelveLabs (coming soon), Writer, and Amazon through a single API for generative AI workloads.
With Amazon Bedrock Agents, you can build AI-driven workflows where models orchestrate function calls and reason through tasks by using natural language.
Amazon EventBridge enables you to build loosely coupled, event-driven architectures that trigger AI workflows automatically.
Use AWS Step Functions to orchestrate multi-step AI pipelines and connect AWS services using visual workflows.
With AWS IoT Greengrass and Lambda@Edge, you can deploy models and logic at the edge for low-latency inference in IoT and global applications.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Core principles of serverless AI on AWS