MLSEC-07: Keep only relevant data - Machine Learning Lens

MLSEC-07: Keep only relevant data

Preserve data across computing environments (such as development and staging) and only store use-case relevant data to reduce data exposure risks. Implement mechanisms to enforce a lifecycle management process across the data. Decide when to automatically remove stale data.

Implementation plan

  • Establish a data lifecycle plan - Understand usage patterns and requirements for debugging and operational tasks. Establish a data lifecycle plan to reduce data sprawl over time.

  • Design for privacy - Remove sensitive elements that are not needed for the ML workflow. Detect and redact personally identifiable information (PII), while maintaining data usability. Determine what features are required to solve the business problem and valuable for future iterations.  

Documents

Blogs

Videos

Examples