MLSUS-13: Optimize models for inference - Machine Learning Lens

MLSUS-13: Optimize models for inference

Improve efficiency of your models and thus use less resources for inference by compiling the models into optimized forms. 

Implementation plan

  • Use open-source model compilers - Libraries such as Treelite (for decision tree ensembles) improve the prediction throughput of models, due to more efficient use of compute resources.

  • Use third-party tools - Solutions like Hugging Face Infinity allow you to accelerate transformer models and run inference not only on GPUs but also on CPUs.

  • Use Amazon SageMaker Neo - SageMaker Neo enables developers to optimize ML models for inference on SageMaker in the cloud and supported devices at the edge. SageMaker Neo runtime consumes as little as one-tenth the footprint of a deep learning framework while optimizing models to perform up to 25 times faster with no loss in accuracy.

Documents

Blogs