AWS modern data architecture - AWS Prescriptive Guidance

AWS modern data architecture

This guide doesn’t describe how to implement a data strategy framework on AWS. That is an extensive topic that is covered in AWS documentation, blog posts, and other guides (see the Resources section). However, the following diagram provides a high-level overview. It illustrates the main components of a modern data architecture on AWS and covers most of the services that can be in your roadmap.

AWS data services

The main components of this architecture support the technical tenets for a modern data strategy that were discussed earlier:

  1. Use an integrated, cost-effective, and scalable storage layer, so every data producer and consumer has the technical capabilities to interact with data.

    Amazon Simple Storage Service (Amazon S3) is an object storage service that provides integration, scalability, data availability, security, and performance at a low cost.

  2. Security is mandatory. Apply data privacy rules, provide data protection with encryption, enable auditing, and provide automated compliance.

    To apply data privacy, protection, and compliance in an automated manner, and to enable auditing, you can use AWS Key Management Service (AWS KMS), AWS Identity and Access Management (IAM), AWS Secrets Manager, AWS Audit Manager, and Amazon Macie.

  3. Govern the data to share it across the company. Provide a unique data catalog and a business glossary so users can find and use the data they need.

    AWS Lake Formation helps you govern data and share it across the company. In addition, you can create a unique data catalog on AWS Glue and a business glossary by using Amazon DataZone (in preview) to enable your employees to find the data they need.

  4. Select the right service for the right job. Consider functionality, scalability, data latency, the effort required to run the service, resilience, integration, and automation when you choose a component.

    You can consider Amazon Athena, Amazon EMR, AWS Glue, Amazon OpenSearch Service, Amazon Kinesis, Amazon Redshift, Amazon Managed Streaming for Apache Kafka (Amazon MSK), and Amazon QuickSight to manage your tasks. For example, you can perform real-time streaming with Kinesis or Amazon MSK, data processing with Amazon EMR or AWS Glue, search with OpenSearch Service, ad-hoc queries with Athena, and data warehousing with Amazon Redshift.

  5. Use artificial intelligence (AI) and machine learning (ML).

    You can enable the usage of artificial intelligence with AWS AI services and machine learning with Amazon SageMaker.

  6. Provide data literacy and tools with abstractions for business people.

    Processes for providing data literacy, tools, and abstractions aren’t part of the architecture, but you can use Amazon DataZone (in preview), AWS Lake Formation, and Amazon QuickSight as data abstraction tools.

  7. Test the hypotheses of your data initiatives and measure their results.

    You can use the Amazon OpenSearch Service dashboard or Amazon QuickSight to work with business outcome metrics and test results, and validate your hypotheses.

For examples of sample architectures for different use cases, see the reference architecture diagrams in the AWS Architecture Center. Your technical team should use these diagrams for reference only and customize them based on your own requirements, environments, and projects.