MLSEC-05: Protect sensitive data privacy
Protect sensitive data used in training against unintended disclosure. Identify and classify the sensitive data. Handle the sensitive data using strategies including: removing, masking, tokenizing, and principal component analysis (PCA). Document best governance practices for future reuse and references.
Implementation plan
-
Use automated mechanisms to classify data where possible - Use automated sensitive data discovery in Amazon Macie that provides continual, cost efficient, organization-wide visibility into where sensitive data resides across your Amazon S3 environment. Macie automatically and intelligently inspects your S3 buckets for sensitive data such as personally identifiable information (PII), financial data, and AWS credentials. Macie then builds and continuously maintains an interactive data map of the locations in Amazon S3 where your sensitive data resides, and provides a sensitivity score for each bucket.
-
Use tagging – Tag resources and models that are made from sensitive elements to quickly differentiate between resources requiring protection and those that do not.
-
Encrypt sensitive data - Encrypt sensitive data using services such as AWS KMS
, the AWS Encryption SDK, or client-side encryption. -
Reduce data sensitivity - Evaluate and identify data for anonymization or de-identification to reduce sensitivity.