Platform engineering - AWS Prescriptive Guidance

Platform engineering

Build a secure, compliant multi-account cloud environment with packaged, reusable cloud products.

To support innovation by enabling development teams, the platform needs to adapt at a rapid pace to keep up with the demands of the business. (See the AWS CAF Business perspective.) It must do so while being flexible enough to adapt to product management demands, rigid enough to adhere to security constraints, and fast enough to enable operational needs. This process requires the building of a compliant multi-account cloud environment with enhanced security features, and packaged, reusable cloud products. 

An effective cloud environment allows your teams to easily provision new accounts while ensuring that those accounts conform to organizational policies. A curated set of cloud products enables you to codify best practices, helps you with governance, and helps increase the speed and consistency of your cloud deployments. Deploy your best practice blueprints, and detective and preventative guardrails. Integrate your cloud environment with your existing landscape to enable desired hybrid cloud use cases.

Automate the account provisioning workflow and use multiple accounts to support your security and governance goals. Set up connectivity between your on-premises and cloud environments as well as between different cloud accounts. Implement federation between your existing identity provider (IdP) and your cloud environment so that users can authenticate by using their existing login credentials. Centralize logging, establish cross-account security audits, create inbound and outbound DNS resolvers, and get dashboard visibility into your accounts and guardrails.

Evaluate and certify cloud services for consumption in alignment with corporate standards and configuration management. Package and continuously improve enterprise standards as self-service deployable products and consumable services. Leverage infrastructure as code (IaC) to define configurations in a declarative way. Create enablement teams to evangelize the platform to developers and business users and allow them to build integrations that accelerate adoption across your organization.

Completing the tasks discussed in the following sections requires you to build capabilities and teams to evolve your organizations toward modern platform engineering. For technical details, see the Establishing your Cloud Foundation on AWS whitepaper.

Start

Build a landing zone and deploy guardrails

As you start your journey to mature platform engineering, you must first deploy your landing zone with detective and preventative guardrails as defined in the platform architecture capability. Guardrails ensure that organizational standards aren't violated as application owners consume cloud resources. With this mechanism, you automate the account provisioning workflow to use multiple accounts that support your security and governance goals. 

Establish authentication

Implement identity management and access control across all environments, systems, workloads, and processes in accordance with standards dictated in the AWS CAF Security perspective. For workforce identities, restrict the use of AWS Identity and Access Management (IAM) users and instead rely on an identity provider that enables you to manage identities in a centralized place. This makes it easier to manage access across multiple applications and services, because you are creating, managing, and revoking access from a single location. Use existing processes to manage the creation, update, and removal of access to include your AWS environments.

Deploy your network

In accordance with your platform architecture designs, create a centralized network account to control inbound and outbound traffic to and from your environment. We recommend that you design your networks for rapidly provisioned connectivity between your on-premises network and your AWS environments, to and from the internet, and across your AWS environments. Centralizing your network management enables you to deploy network controls to isolate networks and connectivity across your environment by using preventive and reactive controls.

Collect, aggregate, and protect event and log data

Use Amazon CloudWatch cross-account observability. It provides a unified interface to search, visualize, and analyze metrics, logs, and traces across your linked accounts, and eliminates account boundaries.

If your organization has specific compliance requirements for centralized log control and security, consider setting up a dedicated log archive account. This offers a centralized, encrypted repository specifically for log data. Enhance the security of this archive by regularly rotating encryption keys.

Implement robust policies for protecting sensitive log data, using masking techniques as necessary. Use log aggregation for compliance, security, and audit logs, and ensure the use of strict guardrails and identity constructs to prevent unauthorized changes to log configurations.

Establish controls

In accordance with the definitions from the AWS CAF Security perspective, deploy foundational security capabilities that meet your business requirements. Deploy additional preventative and detective controls, and provision those programmatically and consistently across all your accounts where required. Integrate detective controls into operational tooling as defined by the platform architecture capability so that non-compliant resources can be reviewed by operational mechanisms.

Implement cloud financial management

In accordance with the AWS CAF Governance perspective, implement cost allocation tags, and AWS Cost Categories that align your organization's tagging strategy with financial accountability for cloud consumption. AWS Cost Categories let you charge or show cloud charges back to internal cost centers by using tools such as AWS Cost Explorer and billing data published in AWS Cost and Usage Report.  

Advance

Build infrastructure automation

Before you proceed, evaluate and certify cloud services for consumption in alignment with your platform architecture. Then, package and continuously improve enterprise standards as deployable products and consumable services, and use infrastructure as code (IaC) to define configurations in a declarative way. Infrastructure automation mimics software development cycles by allowing access to specific services in each account with role-based access control (RBAC) or attribute-based access control (ABAC). Deploy a method to rapidly provision new accounts and align them with your service and incident management capabilities by using APIs, or develop self-service capabilities. Automate network integration and IP allocation as accounts are created to ensure compliance and network security. Integrate new accounts with your IT service management (ITSM) solution by using native connectors that are configured to operate with AWS. Update your playbooks and runbooks as appropriate.

Provide centralized observability services

To achieve effective cloud observability, your platform should support real-time search and analysis of both local and centralized log data. As your operations scale, your platform's ability to index, visualize, and interpret log, metrics, and traces is key to turning raw data into actionable insights.

By correlating logs, metrics, and traces, you can extract actionable conclusions and develop targeted, informed responses. Establish rules that allow proactive responses to security events or patterns that are identified in your logs, metrics, or traces. As your AWS solutions expand, ensure that your monitoring strategy scales in tandem to maintain and enhance your observability capabilities.

Implement systems management and AMI governance

Organizations that use Amazon Elastic Compute Cloud (Amazon EC2) instances extensively require operational tooling to manage instances at scale. Software asset management, endpoint detection and response, inventory management, vulnerability management, and access management are foundational capabilities for many organizations.  These capabilities are often delivered through software agents that are installed on instances. Develop a capability to package agents and other custom configurations into Amazon Machine Images (AMIs), and make these AMIs available to consumers of the cloud platform. Use preventative and detective controls that govern the use of these AMIs. AMIs should contain tooling that enables management of long-running EC2 instances at scale, particularly for mutable Amazon EC2 workloads that don't consume new AMIs on a regular basis. You can use AWS Systems Manager at scale to automate agent upgrades, collect system inventory, access EC2 instances remotely, and patch operating system vulnerabilities.

Manage credential use

In accordance with the AWS CAF Security perspective, implement roles and temporary credentials. Use tooling to manage remote access to instances or on-premises systems by using a pre-installed agent without storing secrets. Reduce reliance on long-term credentials, and scan for hardcoded credentials in your IaC templates. If you can't use temporary credentials, use programmatic tools such as application tokens and database passwords to automate credential rotation and management. Codify users, groups, and roles by using principles of least privilege with IaC, and prevent the manual creation of identity accounts by using guardrails.

Establish security tooling

Security monitoring tools should support granular security monitoring across infrastructure, applications, and workloads and provide aggregated views for pattern analysis. As with all other security management tools, you should extend your extended detection and response (XDR) tools to provide functions to assess, detect, respond to, and remediate the security of your applications, resources, and environments on AWS in accordance with requirements defined in the AWS CAF Security perspective.

Excel

Source and distribute identity constructs with automation

Codify and version identity constructs such as roles, policies, and templates with IaC tools. Use policy  validation tools to check for security warnings, errors, general warnings, suggested changes to your IAM policies, and other findings. Where appropriate, deploy and remove identity constructs that provide temporary access to the environment in an automated manner, and prohibit deployment by individuals who are using the console.

Add detection and alerts for anomalous patterns across environments

Proactively assess environments for known vulnerabilities and add detection for unusual event and activity patterns. Review findings and make recommendations to platform architecture teams for changes that drive further efficiency and innovation. 

Analyze and model for threats

Implement continuous monitoring and measurement against industry and security benchmarks in accordance with the requirements from the AWS CAF Security perspective. When you implement your instrumentation approach, determine which types of event data and information will best inform your security management functions. This monitoring encompasses several attack vectors, including service usage. Your security foundations should include a comprehensive capability for secure logging and analytics across your multi-account environments that includes the ability to correlate events from multiple sources. Prevent changes to this configuration with specific controls and guardrails. 

Continuously collect, review, and refine permissions

Record changes to identity roles and permissions and implement alerts when detective guardrails detect deviations from your expected configuration state. Use aggregated and pattern identification tools to review your centralized collection of events and refine permissions as required.

Select, measure, and continuously improve your platform metrics

To enable successful platform operations, establish and routinely review comprehensive metrics. Ensure that they align with organizational goals and stakeholder needs. Track both platform performance and improvement metrics, and combine operational parameters such as patch, backup, and compliance by using team enablement and tool adoption indicators.

Use CloudWatch cross-account observability for efficient metric management. This service streamlines data aggregation and visualization to enable informed decisions and targeted enhancements. Use these metrics as indicators of success and drivers of change to foster an environment of continuous improvement.