SEC07-BP04 Define scalable data lifecycle management
Understand your data lifecycle requirements as they relate to your different levels of data classification and handling. This can include how data is handled when it first enters your environment, how data is transformed, and the rules for its destruction. Consider factors such as retention periods, access, auditing, and tracking provenance.
Desired outcome: You classify data as close as possible to the point and time of ingestion. When data classification requires masking, tokenization, or other processes that reduce sensitivity level, you perform these actions as close as possible to point and time of ingestion.
You delete data in accordance with your policy when it is no longer appropriate to keep, based on its classification.
Common anti-patterns:
-
Implementing a one-size-fits-all approach to data lifecycle management, without considering varying sensitivity levels and access requirements.
-
Considering lifecycle management only from the perspective of either data that is usable, or data that is backed up, but not both.
-
Assuming that data that has entered your workload is valid, without establishing its value or provenance.
-
Relying on data durability as a substitute for data backups and protection.
-
Retaining data beyond its usefulness and required retention period.
Benefits of establishing this best practice: A well-defined and scalable data lifecycle management strategy helps maintain regulatory compliance, improves data security, optimizes storage costs, and enables efficient data access and sharing while maintaining appropriate controls.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Data within a workload is often dynamic. The form it takes when
entering your workload environment can be different from when it
is stored or used in business logic, reporting, analytics, or
machine learning. In addition, the value of data can change over
time. Some data is temporal in nature and loses value as it gets
older. Consider how these changes to your data impact evaluation
under your data classification scheme and associated controls.
Where possible, use an automated lifecycle mechanism, such as
Amazon S3 lifecycle policies and the
Amazon Data Lifecycle Manager
Distinguish between data that is available for use, and data that
is stored as a backup. Consider using
AWS Backup
Another aspect of lifecycle management is recording the history of
data as it progresses through your workload, called data
provenance tracking. This can give confidence that you
know where the data came from, any transformations performed, what
owner or process made those changes, and when. Having this
history helps with troubleshooting issues and investigations
during potential security events. For example, you can log
metadata about transformations in an
Amazon DynamoDB
Implementation steps
-
Analyze the workload's data types, sensitivity levels, and access requirements to classify the data and define appropriate lifecycle management strategies.
-
Design and implement data retention policies and automated destruction processes that align with legal, regulatory, and organizational requirements.
-
Establish processes and automation for continuous monitoring, auditing, and adjustment of data lifecycle management strategies, controls, and policies as workload requirements and regulations evolve.
Resources
Related best practices:
Related documents:
Related examples:
Related tools: