Design Principles - Analytics Lens

Design Principles

In the cloud, a number of principles help you increase reliability. In particular, the following are emphasized for analytics workloads. For more information, refer to the design principles in the AWS Well-Architected Framework whitepaper.

  • Manage the lifecycle of data assets, transitioning and expiration: Apply a governance process to how datasets and assets are maintained. Establish a review cycle on the relevance and freshness of a dataset. Ensure that operational maintenance cycles on managed datasets are being maintained. Such governance can include how data is updated or refreshed, institutional input tracking on data value and use, and criteria and process for data expiration including the use of tiered storage for cost management of data.

  • Enforce Data Hygiene: When managing datasets for institutional use, apply mechanisms to assure data model standards are defined and enforced. Every dataset being used within an institutional workload should have a governance model applied to it with a repository that records its control, access, cleanliness standard, usage lineage, and overall management

  • Preserve data lineage: Derived datasets are mutable; original data is not. The process of ingesting data into a data lake begins with the unmodified original “raw” data from which all downstream processes and manipulations derive. This raw dataset may undergo multiple transformations before it can be used by downstream analytical applications. It's important to maintain traceability of data attributes as the data moves through each layer of the analytical system with data lineage capture. A metadata repository that maps and tracks the schema changes can be used to capture such lineage information.