Challenges for public sector - Machine Learning Best Practices for Public Sector Organizations

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Challenges for public sector

Government, education, and nonprofit organizations face several challenges in implementing ML programs to accomplish their mission objectives. This section outlines some of the challenges in seven critical areas of an ML implementation. These are outlined as follows:

  1. Data Ingestion and Preparation. Identifying, collecting, and transforming data is the foundation for ML. The ability to extract data from different types of data sources (ranging from flat files to databases, structured and unstructured, real time and batch) can be challenging given the range of technologies found in public sector organizations. Once the data is extracted, it needs to be cataloged and organized so that it is available for consumption with the necessary approvals in compliance with public sector guidelines.

  2. Model Training and Tuning. There are hundreds of algorithms available for ML model training and tuning that solve various types of problems. One of the major challenges facing public sector organizations is the ability to create a common platform that provides these algorithms and the structure required for visibility and maintenance. Challenges also exist in optimizing model training performance with minimal resources without compromising on the quality of ML models.

  3. ML Operations (MLOps). Integrating ML into business operations, referred to as MLOps, requires significant planning and preparation. One of the major hurdles facing government organizations is the ability to create a repeatable process for deployment that is consistent with their organizational best practices. Mechanisms need to be put in place to ensure scalability and availability, as well as recovery of the models in case of disasters. Another challenge is to effectively monitor the model in production to ensure that ML models do not lose their effectiveness due to introduction of new variables, changes in source data, or issues with source data.

  4. Management & Governance. Public sector organizations face increased scrutiny to ensure that public funds are being properly utilized to serve mission needs. As such, they need to provide increased visibility into monitoring and auditing ML workloads. Changes need to be tracked in several places, including data sources, data models, data transfer and transformation mechanisms, deployments and inference endpoints. A clear separation needs to be put in place between development and production workloads while enforcing separation of duties with appropriate approval mechanisms. In addition, any underlying infrastructure, software, and licenses need to be maintained and managed.

  5. Security & Compliance. Security and compliance of ML workloads is one of the biggest challenges facing public sector organizations. The sensitive nature of the work done by these organizations results in increased security requirements at all levels of an ML platform. This can be very challenging as data is spread across a large number of data sources, is constantly evolving, and is constantly sent across the network between data storage and compute platforms. Data is also transmitted between compute instances in the case of distributed learning. Last but not least is the alignment with the principles of least privilege and application of a consistent user authentication and authorization mechanism.

  6. Cost Optimization. Given the complexity of ML projects, and the amount of data, compute, and other software required to successfully manage a project, costs can quickly spiral out of control. The challenge facing public sector agencies is the need to account for the resources used, and to monitor the usage against specified cost centers and task orders. Not only do they need to track usage of resources, but they also need to be able to effectively manage the costs.

  7. Bias & Explainability. Given the impact of public sector organizations on the citizens, the ability to understand why an ML model makes a specific prediction becomes paramount – this is also known as ML explainability. Organizations are under pressure from policymakers and regulators to ensure that ML and data-driven systems do not violate ethics and policies, and do not result in potentially discriminatory behavior. In January 2020, the U.S. government published draft rules for the regulation of Artificial Intelligence (AI) in the United States. These rules state that any government regulation of public sector AI must encourage “reliable, robust, and trustworthy AI” and these standards should be the overarching guiding theme. Demonstrating explainability is a significant challenge because complex ML models are hard to understand and even harder to interpret and debug. Public sector organizations need to invest significant time with appropriate tools, techniques, and mechanisms to demonstrate explainability and lack of bias in their ML models, which could be a deterrent to adoption.