Platform architecture - AWS Prescriptive Guidance

Platform architecture

Establish and maintain guidelines, principles, patterns, and guardrails for your cloud environment.

A well-architected cloud environment helps you accelerate implementation, reduce risk, and drive cloud adoption. The platform architecture capability creates consensus within your organization for enterprise standards that drive cloud adoption. You define best practice blueprints and guardrails to facilitate authentication, security, networking, and logging and monitoring. Additionally you take into consideration and plan for workloads you might need to retain on premises due to latency, data processing, or data residency requirements and evaluate hybrid cloud use cases  such as cloud bursting, backup and disaster recovery to the cloud, distributed data processing, and edge computing.

Start

Define a multi-account strategy

A good multi-account strategy considers scale and operational efficiency concerns. This means isolating your workloads into a logical pattern that best meets your operational needs. We suggest that you start with a foundational set of accounts to accommodate centralized and decentralized services in your enterprise. You can centralize security, financial, and operational functions to effectively manage and govern your distributed and autonomous teams and accounts. You will want to align across your organization to understand how the platform and your workloads will be segmented and managed. Understanding this structure helps you ensure that security principles are in place for authentication and authorization while aligning to evolving acceptable use policies for the platform.

Define preventative controls

Plan for a secure, multi-account environment with an embedded set of default controls (guardrails). Begin to understand and use a mechanism such as service control policies (SCPs) to manage service use across your organization, including the AWS Regions that are available for consumption within your cloud platform. Policies provide a centralized mechanism for controlling the maximum permissions available for all accounts and ensuring that they adhere to the organization's access control guidelines.

Define organizational unit structure

Organizational units (OUs) serve as a practical way to manage and categorize accounts based on regulatory requirements and software development lifecycle (SDLC) environments. By using OUs, organizations streamline the process of applying for appropriate policies and permissions across their cloud infrastructure. Workload OUs are specifically designed for accounts that support application infrastructure resources, and ensure that the right policies are enforced. Using OUs and SCPs help enhance your organization's cloud infrastructure's security and compliance while also ensuring the smooth operation of your applications and services. This ultimately leads to a more efficient and robust cloud adoption process.

Define network connectivity

Network connectivity is a crucial aspect of any cloud infrastructure that supports the creation of secure, scalable, and highly available networks to support applications and workloads. A well-designed network provides consistently high performance and ensures seamless operations across different environments.

When you design your network architecture, consider if you have workloads that you want to retain on premises due to latency, data processing, or data residency requirements. By evaluating hybrid cloud use cases such as cloud bursting, backup and disaster recovery to the cloud, distributed data processing, and edge computing, you can identify the key requirements for the following aspects:

  • Connectivity to and from the internet. This aspect involves providing secure and reliable connections between your applications or workloads and the internet. This connectivity is essential for facilitating access to web-based resources, enabling communications between users and applications, and ensuring that your services are accessible to the public when needed.

  • Connectivity across your cloud environments. This area focuses on establishing robust connections among various components and services within your cloud infrastructure. It ensures that data and resources are easily shared and accessed across different cloud services, promoting efficient collaboration and smoother operations. A key consideration here is your use of virtual private clouds (VPCs). To keep things simple, consider creating standards on how VPCs are created and tracked. Consider creating these standards programmatically, and plan to use an IP address management (IPAM) solution. Allocate enough IP space to allow for growth, and design subnet structures for easy troubleshooting when using multiple Availability Zones. Make sure to follow security best practices for VPCs when you design and implement network connectivity. 

  • Connectivity between your on-premises network and your cloud environments. This aspect deals with the integration of your on-premises infrastructure with your cloud-based environment. By creating secure and reliable connections between the two, organizations benefit from the advantages of hybrid architectures. For example, you can use on-premises resources and cloud services simultaneously for improved performance, scalability, and cost optimization.

By addressing these three key areas of network connectivity, you can build a robust cloud infrastructure that supports your applications and workloads effectively, so you can capitalize on the benefits of cloud adoption. Take note of networking requirements, and create a simple design that enables you to scale in accordance with your multi-account strategy. 

Define DNS strategy

A well-planned DNS strategy helps you avoid complications as your cloud environments grow. If you maintain on-premises DNS capabilities, we recommend that you design hybrid DNS architectures that use on-premises DNS infrastructure along with cloud DNS for any cloud-based DNS requirements. Integrate DNS resolution with on-premises DNS environments by using resolver endpoints and forwarding rules. Use private hosted zones to hold information about how you want cloud DNS to respond to queries for a domain and its subdomains within one or more networks.

Define tagging standards

Tagging resources is an essential practice to manage costs effectively and identify ownership of resources. Consider how your organization will further allow consumption in the cloud, including the use of specific services within the platform. Define a tagging strategy that tracks which resources are being deployed by which teams. Take inputs from the AWS CAF Operations perspective and use tags to automate tasks for your deployed infrastructure. 

Additionally, by tagging resources with relevant metadata, you can group and track your spending based on your organizational requirements dictated in the Cloud Financial Management (CFM) capability in the AWS CAF Governance perspective. Identify a mechanism for reporting that supports your accounting and financial practices, including actions to be taken when financial policies are violated.

Define an observability strategy

Establishing an observability strategy is a critical step toward optimizing and securing your cloud architecture. This strategy revolves around transforming the metrics and logs produced by your cloud services into actionable insights for strategic decision-making. Prioritize monitoring key performance indicators and setting up alerts to preemptively address potential issues. To prevent tool proliferation, optimize costs, and focus on what matters most to your organization, incorporate this observability strategy across both your platform and applications. For further guidance, see our presentation on Developing an observability strategy (AWS re:Invent 2022).

Advance

Define proactive and detective controls

To advance, your organization must identify the need for proactive and detective controls (guardrails) within the environment. Create policies that define the guardrails or limits that roles and users have in the accounts located within an organizational unit (OU). Review any default detective guardrails for the platform, and choose which guardrails to apply. Create additional preventive and detective controls as required, and group them by OUs to align them to your multi-account strategy. Consider which organizational tools and mechanisms you need to inspect non-compliant resources that are identified by detective controls.

Define standards for service onboarding

Create standards for the acceptable use of the platform and the patterns associated with service consumption and how that will be governed. Consider which initial services are allowed for use. Create a document that outlines these standards and publish them to users and operators of the platform. Ensure that these standards adapt over time to meet the changing objectives of the organization and the evolving capabilities of cloud computing.

Define patterns and principles

Consider which architectural patterns will be allowed within your organization by using inputs from application owners, and begin to define blueprints for standardization. Standardization allows for greater governance and lower administrative burden as you scale in the cloud. Define patterns that will use infrastructure as code (IaC) and plan for a simplified deployment model by using a service catalog that's integrated into your change control processes and IT service management (ITSM) systems. Define how these blueprints will be used and the circumstances for allowing exceptions. Plan for those exceptions and their governance, with considerations for authentication, security monitoring, and guardrails. 

Excel

Define remediation patterns

Consider how to annotate and prioritize your detective guardrail findings so they can be remediated in accordance with your security and compliance frameworks. Plan to use automation to detect out-of-policy provisioning of resources, including those that violate budgetary and tagging policies. Identify the capabilities needed to set and measure service-level objectives while updating your runbooks and playbooks. Set periodic reviews of these practices and a feedback mechanism to capture data related to platform evolution. Define mechanisms to create and update runbooks and playbooks accordingly. 

Communicate and refine policies

Create a centralized content management system for all documentation and distribute it to the users and operators of the platform. Create a mechanism to capture feedback for future consideration on changes to the policy.

Understand financial management capabilities

Organizations thrive when they maintain a transparent and comprehensive understanding of their budget. This empowers them to make well-informed decisions, allocate resources efficiently, and accomplish their strategic objectives. A clear view of the budget helps organizations excel by facilitating informed decision-making, effective resource allocation, cost control, performance measurement, and the maintenance of accountability and compliance. This ultimately results in a more efficient, financially stable, and prosperous organization. When you have a successful tagging strategy, you can use cost filters in AWS Budgets to filter expenses based on resource tags. This helps you create a budget that's tailored to specific projects, departments, environments, or other criteria, further enhancing financial management capabilities. You can associate cost allocation tags and AWS Cost Categories with tags to drive financial insights and transparency when reporting on cost.