The large model inference (LMI) container documentation - Amazon SageMaker AI

The large model inference (LMI) container documentation

The Large Model Inference (LMI) container documentation is provided on the Deep Java Library documentation site.

The documentation is written for developers, data scientists, and machine learning engineers who need to deploy and optimize large language models (LLMs) on Amazon SageMaker AI. It helps you use LMI containers, which are specialized Docker containers for LLM inference, provided by AWS. It provides an overview, deployment guides, user guides for supported inference libraries, and advanced tutorials.

By using the LMI container documentation, you can:

  • Understand the components and architecture of LMI containers

  • Learn how to select the appropriate instance type and backend for your use case

  • Configure and deploy LLMs on SageMaker AI using LMI containers

  • Optimize performance by using features like quantization, tensor parallelism, and continuous batching

  • Benchmark and tune your SageMaker AI endpoints for optimal throughput and latency