Cost optimization - Machine Learning Best Practices for Public Sector Organizations

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Cost optimization

Cost management is a primary concern for public sector organizations projects to ensure the best use of public funds while enabling agency missions. AWS provides several mechanisms to manage costs in each phase of the ML lifecycle (Prepare, Build, Train & Tune, Deploy, and Manage) as described in this section.

Prepare

This step of the ML lifecycle includes storing the data, labeling the data, and processing the data. Cost control in this phase can be accomplished using the following techniques:

  • Data Storage: ML requires extensive data exploration and transformation. Multiple redundant copies of data are quickly generated, which can lead to exponential growth in storage costs. Therefore, it is essential to establish a cost control strategy at the storage level. Processes can be established to regularly analyze source data and either remove duplicative data or archive data to lower cost storage based on compliance policies. For example, for data stored in S3, S3 storage class analysis can be enabled on any group of objects (based on prefix or object tagging) to automatically analyze storage access patterns. This enables identification and transition of rarely-accessed data to S3 glacier, lowering costs. S3 intelligent storage can also be used to lower costs of data that has unpredictable usage patterns. It works by monitoring and moving data between a data tier that is optimized for frequent access and another lower-cost tier that is optimized for infrequent access.

  • Data Labeling. Data labeling is a key process of identifying raw data (such as images, text files, and videos) and adding one or more meaningful and informative labels to provide context so that an ML model can learn from it. This process can be very time consuming and can quickly increase costs of a project.

Amazon SageMaker Ground Truth can be used to reduce these costs. Ground Truth’s automated data labeling utilizes the Active Learning ML technique to reduce the number of labels required for models, thereby lowering these costs. Ground Truth also provides additional mechanisms such as crowdsourcing with Amazon Mechanical Turk or another vendor company, that can be chosen to lower the costs of labeling.

  • Data Wrangling. In ML, a lot of time is spent in identifying, converting, transforming, and validating raw source data into features that can be used to train models and make predictions. Amazon SageMaker AI Data Wrangler can be used to reduce this time spent, lowering the costs of the project. With Data Wrangler, data can be imported from various data sources, and transformed without requiring coding. Once data is prepared, fully automated ML workflows can be built with Amazon SageMaker AI Pipelines and saved for reuse in the Amazon SageMaker AI Feature Store, eliminating the costs incurred in preparing this data again.

Build

This step of the ML lifecycle involves building ML models. Cost control in this phase can be accomplished using the following techniques:

  • Notebook Utilization. An Amazon SageMaker AI notebook instance is a ML compute instance running the Jupyter Notebook. It helps prepare and process data, write code to train models, deploy models to SageMaker hosting, and test or validate models. Costs incurred can be reduced significantly by optimizing notebook utilization. One way is to stop the notebook instance when it’s not being used and starting it up only when needed. Another option is to use a lifecycle configuration script that automatically shuts down the instance when not being worked on. (See Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker AI for details.)

  • Test code locally. The SageMaker Python SDK supports local mode, which allows creation of estimators and deployment to the local environment. Before a training job is submitted, running the fit function in local mode enables early feedback prior to running in SageMaker’s managed training or hosting environments. Issues with code and data can be resolved early to reduce costs incurred in failed training jobs. This also saves time spent in initializing the training cluster.

  • Use Pipe mode (where applicable) to reduce training time. Certain algorithms in Amazon SageMaker AI, such as Blazing text, work on a large corpus of data. When these jobs are launched, significant time goes into downloading the data from Amazon S3 into Amazon EBS. Training jobs don’t start until this download finishes.

These algorithms can take advantage of Pipe mode, in which training data is streamed from Amazon S3 into Amazon EBS to start training jobs immediately.

  • Find the right balance: Performance vs. accuracy. 32-bit (single precision or FP32) and even 64-bit (double precision or FP64) floating point variables are popular for many applications that require high precision. These are workloads such as engineering simulations that simulate real-world behavior and need the mathematical model to be as exact as possible. In many cases, however, moving to half or mixed precision (16-bit or FP16) reduces training time and consequently costs less, and is worth the minor tradeoffs in accuracy. See this Accelerating GPU computation through mixed-precision methods for details. A similar trade-off also applies when deciding on the number of layers in a neural network for classification algorithms, such as image classification. Throughput of 16-bit floating point and 32-bit floating point calculations need to be compared to determine an appropriate approach for the model in question.

  • Jumpstart: Developers who are new to ML often learn that importing an ML model from a third-party source and getting an API endpoint up and running to deploy the model can be time-consuming. The end-to-end process of building a solution, including building, training, and deploying a model, and assembling different components, can take months for users new to ML. SageMaker JumpStart accelerates time-to-deploy over 150 open-source models and provides pre-built solutions, preconfigured with all necessary AWS services required to launch the solution into production, including CloudFormation templates and reference architecture.

  • AWS Marketplace: AWS Marketplace is a digital catalog with listings from independent software vendors to find, test, buy, and deploy software that runs on AWS. AWS Marketplace provides many pre-trained, deployable ML models for SageMaker. Pre-training the models enables the delivery of ML-powered features faster and at a lower cost.

Train and Tune

This step of the ML lifecycle involves providing the algorithm selected in the build phase with the training data to learn from, and setting the model parameters to optimize the training process. Cost control in this phase can be accomplished using the following techniques:

  • Use Spot Instances. If the training job can be interrupted, Amazon SageMaker AI Managed spot training can be used to optimize the cost of training models up to 90% over On-Demand Instances. Training jobs can be configured to use Spot Instances and a stopping condition can be used to specify how long Amazon SageMaker AI waits for a job to run using EC2 Spot Instances. See this Managed Spot Training: Save Up to 90% On Your Amazon SageMaker AI Training Jobs for details.

  • Hyperparameter optimization (HPO). Amazon SageMaker AI’s built-in HPO automatically adjusts hundreds of different combinations of parameters to quickly arrive at the best solution for your ML problem. When combined with high-performance algorithms, distributed computing, and managed infrastructure, built-in HPO drastically decreases the training time and overall cost of building production-grade systems. Built-in HPO works best with a reduced search space.

  • CPU vs GPU. CPUs are best at handling single, more complex calculations sequentially, whereas GPUs are better at handling multiple but simple calculations in parallel. GPUs provide a great price/performance ratio if effectively used. However, GPUs also cost more, and should be chosen only when really needed. For many use cases, a standard current generation instance type from an instance family such as ml.m* provides enough computing power, memory, and network performance for many Jupyter notebooks to perform well. A best practice is to start with the minimum requirement in terms of ML instance specification and work up to identifying the best instance type and family for the model in question.

  • Distributed Training. When using massive datasets for training, the process can be sped up by distributing training on multiple machines or processes in a cluster as described earlier. Another option is to use a small subset of data for development, and use the full dataset for a training job that is distributed across optimized instances such as P2 or P3 GPU instances or an instance with powerful CPU, such as c5.

  • Monitor the performance of your training jobs to identify waste. Amazon SageMaker AI is integrated with CloudWatch out of the box and publishes instance metrics of the training cluster in CloudWatch. These metrics enable adjustments to the cluster, such as CPUs, memory, number of instances, and more. Also, Amazon SageMaker AI Debugger provides full visibility into model training by monitoring, recording, analyzing, and visualizing training process tensors. Debugger can reduce the time, resources, and cost needed to train models.

Deploy and Manage

This step of the ML lifecycle involves deployment of the model to get predictions, and managing the model to ensure it meets functional and non-functional requirements of the application. Cost control in this phase can be accomplished using the following techniques:

  • Endpoint deployment: Amazon SageMaker AI enables testing of new models using A/B testing. Endpoints need to be deleted when testing is completed to reduce costs. These can be recreated from S3 if and when needed. Endpoints that are not deleted can be automatically detected by using EventBridge / CloudWatch Events and Lambda functions. For example, you can detect if endpoints have been idle (with no invocations over a certain period, such as 24 hours), and send an email or text message with the list of detected idle endpoints using SNS. See this Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker AI for details.

  • Multi-model endpoints. SageMaker endpoints provide the capability to host multiple models. Multi-model endpoints reduce hosting costs by improving endpoint utilization, and provide a scalable and cost-effective solution to deploying a large number of models. Multi-model endpoints enable time-sharing of memory resources across models. It also reduces deployment overhead because Amazon SageMaker AI loads models in memory and scales them based on traffic patterns.

  • Auto Scaling. Amazon SageMaker AI Auto Scaling optimizes the cost of model endpoints. Auto Scaling automatically increases the number of instances to handle increase in load (scale out) and decreases the number of instances when not needed (scale in), thereby reducing operational costs. The endpoint can be monitored to adjust the scaling policy based on the CloudWatch metrics. (See Load test and optimize an Amazon SageMaker AI endpoint using automatic scaling for details).

  • Amazon Elastic Inference for deep learning. For inferences, a deep learning application may not fully utilize the capacity offered by a GPU. Using Amazon Elastic Inference allows the attachment of low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker AI instances to reduce the cost of running deep learning inference by up to 75%.

  • Analyzing costs with Cost Explorer. Cost Explorer is a tool that enables viewing and analyzing AWS service-related costs and usage including SageMaker. Cost allocation tags can be used to get views of costs aggregated across specific views, such as a project. To accomplish this, all Amazon SageMaker AI project-related resources, including notebook instances and the hosting endpoint, can be tagged with user-defined tags. For example, tags can be the name of the project, business unit, or environment (such as development, testing, or production). After user-defined tags have been defined and created, they will need to be activated in the Billing and Cost Management console for cost allocation tracking. These tags can then be used to get different views of costs using Cost Explorer as well as Cost and Usage Reports (Cost Allocation Tags appear on the console after Cost Explorer, Budgets, and AWS Cost and Usage Reports have been enabled).

  • AWS Budgets. AWS Budgets help you manage Amazon SageMaker AI costs, including development, training, and hosting, by setting alerts and notifications when cost or usage exceeds (or is forecasted to exceed) the budgeted amount. After a budget is created, progress can be tracked on the AWS Budgets console. Service Catalog can be integrated with AWS Budgets to create and associate budgets with portfolios and products, and keep developers informed on resource costs for running cost-aware workloads. See Cost Control Blog Series #2: Automate Cost Control using Service Catalog and AWS Budgets for details.