The large model inference (LMI) container documentation

The Large Model Inference (LMI) container documentation is provided on the Deep Java Library documentation site.

The documentation is written for developers, data scientists, and machine learning engineers who need to deploy and optimize large language models (LLMs) on Amazon SageMaker AI. It helps you use LMI containers, which are specialized Docker containers for LLM inference, provided by AWS. It provides an overview, deployment guides, user guides for supported inference libraries, and advanced tutorials.

By using the LMI container documentation, you can:

Understand the components and architecture of LMI containers
Learn how to select the appropriate instance type and backend for your use case
Configure and deploy LLMs on SageMaker AI using LMI containers
Optimize performance by using features like quantization, tensor parallelism, and continuous batching
Benchmark and tune your SageMaker AI endpoints for optimal throughput and latency

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Model parallelism and large model inference

SageMaker AI endpoint parameters for LMI