The large model inference (LMI) container documentation

Focus mode

The large model inference (LMI) container documentation - Amazon SageMaker AI

The Large Model Inference (LMI) container documentation is provided on the Deep Java Library documentation site.

The documentation is written for developers, data scientists, and machine learning engineers who need to deploy and optimize large language models (LLMs) on Amazon SageMaker AI. It helps you use LMI containers, which are specialized Docker containers for LLM inference, provided by AWS. It provides an overview, deployment guides, user guides for supported inference libraries, and advanced tutorials.

By using the LMI container documentation, you can:

Understand the components and architecture of LMI containers
Learn how to select the appropriate instance type and backend for your use case
Configure and deploy LLMs on SageMaker AI using LMI containers
Optimize performance by using features like quantization, tensor parallelism, and continuous batching
Benchmark and tune your SageMaker AI endpoints for optimal throughput and latency

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Model parallelism and large model inference

SageMaker AI endpoint parameters for LMI

Next topic:

SageMaker AI endpoint parameters for LMI

Previous topic:

Model parallelism and large model inference

Need help?

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

The large model inference (LMI) container documentation

Next topic:

Previous topic:

Need help?

Related resources

Did this page help you?

Related resources