Architecture overview - Centralized Logging with OpenSearch

Architecture overview

Deploying this solution with the default parameters builds the following environment in the AWS Cloud.

Architecture diagram

Centralized Logging with OpenSearch architecture overview

This solution deploys the AWS CloudFormation template in your AWS Cloud account and completes the following settings.

  1. Amazon CloudFront distributes the frontend web UI assets hosted in Amazon S3 bucket.

  2. Amazon Cognito user pool or OpenID Connector (OIDC) can be used for authentication.

  3. AWS AppSync provides the backend GraphQL APIs.

  4. Amazon DynamoDB stores the solution related information as backend database.

  5. AWS Lambda interacts with other AWS Services to process core logic of managing log pipelines or log agents, and obtains information updated in DynamoDB tables.

  6. AWS Step Functions orchestrates on-demand AWS CloudFormation deployment of a set of predefined stacks for log pipeline management. The log pipeline stacks deploy separate AWS resources and are used to collect and process logs and ingest them into Amazon OpenSearch Service for further analysis and visualization.

  7. Service Log Pipeline or Application Log Pipeline are provisioned on demand via Centralized Logging with OpenSearch console.

  8. AWS Systems Manager and Amazon EventBridge manage log agents for collecting logs from application servers, such as installing log agents (Fluent Bit) for application servers and monitoring the health status of the agents.

  9. Amazon EC2 or Amazon EKS installs Fluent Bit agents, and uploads log data to application log pipeline.

  10. Application log pipelines read, parse, process application logs and ingest them into Amazon OpenSearch Service domains or Light Engine.

  11. Service log pipelines read, parse, process AWS service logs and ingest them into Amazon OpenSearch Service domains or Light Engine.

After deploying the solution, you can use AWS WAF to protect CloudFront or AppSync. Moreover, you can follow this guide to configure your WAF settings to prevent GraphQL schema introspection.

This solution supports two types of log pipelines: Service Log Analytics Pipeline and Application Log Analytics Pipeline.

Service log analytics pipeline

Centralized Logging with OpenSearch supports log analysis for AWS services, such as Amazon S3 access logs, and Application Load Balancer access logs. For a complete list of supported AWS services, refer to Supported AWS Services.

This solution ingests different AWS service logs using different workflows.

Note

Centralized Logging with OpenSearch supports cross-account log ingestion. If you want to ingest the logs from another AWS account, the resources in the Sources group in the architecture diagram will be in another account.

Logs through Amazon S3

This section is applicable to Amazon S3 access logs, CloudFront standard logs, CloudTrail logs (S3), Application Load Balancing access logs, WAF logs, VPC Flow logs (S3), AWS Config logs, Amazon RDS/Aurora logs, and AWS Lambda Logs.

The workflow supports the following scenarios:

  • Logs to Amazon S3 directly (Amazon OpenSearch for log analytics)

    In this scenario, the service directly sends logs to Amazon S3.

    Amazon S3 based service log pipeline architecture
  • Logs to Amazon S3 via Firehose (Amazon OpenSearch for log analytics)

    In this scenario, the service cannot directly put their logs to Amazon S3. The logs are sent to Amazon CloudWatch, and Firehose is used to subscribe the logs from CloudWatch Log Group and then put logs into Amazon S3.

    Amazon S3 (via Kinesis Data Firehose) based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS service logs are stored in an Amazon S3 bucket (Log Bucket).

  2. An event notification is sent to Amazon SQS using S3 Event Notifications when a new log file is created.

  3. Amazon SQS initiates the Log Processor Lambda to run.

  4. The log processor reads and processes the log files.

  5. The log processor ingests the logs into the Amazon OpenSearch Service.

  6. Logs that fail to be processed are exported to Amazon S3 bucket (Backup Bucket).

For cross-account ingestion, the AWS Services store logs in Amazon S3 bucket in the member account, and other resources remain in central logging account.

  • Logs to Amazon S3 directly (Light Engine for log analytics)

In this scenario, the service directly sends logs to Amazon S3.

Amazon S3 (via Kinesis Data Firehose) based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS service logs are stored in an Amazon S3 bucket (Log Bucket).

  2. An event notification is sent to Amazon SQS using S3 Event Notifications when a new log file is created.

  3. Amazon SQS initiates AWS Lambda.

  4. AWS Lambda copies objects from the log bucket to the staging bucket.

  5. The Log Processor, AWS Step Functions, processes raw log files stored in the staging bucket in batches. It converts them to Apache Parquet format and automatically partitions all incoming data based on criteria including time and region.

Logs through Amazon Kinesis Data Streams

This section is applicable to CloudFront real-time logs, CloudTrail logs (CloudWatch), and VPC Flow logs (CloudWatch).

The workflow supports two scenarios:

  • Logs to KDS directly

    In this scenario, the service directly streams logs to Amazon Kinesis Data Streams.

    Amazon Kinesis Data Streams based service log pipeline architecture

  • Logs to KDS via subscription

    In this scenario, the service delivers the logs to CloudWatch Log Group, and then CloudWatch Logs stream the logs in real-time to KDS as the subscription destination.

    Amazon Kinesis Data Streams (via subscription) based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS Services logs are streamed to Kinesis Data Stream.

  2. KDS initiates the Log Processor Lambda to run.

  3. The log processor processes and ingests the logs into the Amazon OpenSearch Service.

  4. Logs that fail to be processed are exported to Amazon S3 bucket (Backup Bucket).

For cross-account ingestion, the AWS Services store logs on Amazon CloudWatch log group in the member account, and other resources remain in central logging account.

Warning

This solution does not support cross-account ingestion for CloudFront real-time logs.

Application log analytics pipeline

Centralized Logging with OpenSearch supports log analysis for application logs, such as Nginx/Apache HTTP Server logs or custom application logs.

Note

Centralized Logging with OpenSearch supports cross-account log ingestion. If you want to ingest logs from the same account, the resources in the Sources group will be in the same account as your Centralized Logging with OpenSearch account. Otherwise, they will be in another AWS account.

Logs from Amazon EC2/Amazon EKS

  • Logs from Amazon EC2/Amazon EKS (Amazon OpenSearch for log analytics)

    Application log pipeline architecture for EC2/EKS

    The log pipeline runs the following workflow:

    1. Fluent Bit works as the underlying log agent to collect logs from application servers and send them to an optional Log Buffer, or ingest into OpenSearch domain directly.

    2. The Log Buffer triggers the Lambda (Log Processor) to run.

    3. The log processor reads and processes the log records and ingests the logs into the OpenSearch domain.

    4. Logs that fail to be processed are exported to an Amazon S3 bucket (Backup Bucket).

  • Logs from Amazon EC2/Amazon EKS (Light Engine for log analytics)

    Application log pipeline architecture for EC2/EKS

    The log pipeline runs the following workflow:

    1. Fluent Bit works as the underlying log agent to collect logs from application servers and send them to an optional Log Buffer.

    2. The Log Buffer inititates the Lambda to copy objects from the log bucket to the staging bucket.

    3. The log processor (AWS Step Functions) processes raw log files stored in the staging bucket in batches, tranforms them to Apache Parquet, and automatically partitions all incoming data based on criteria which include time and region.

Logs from Syslog Client

Important

1. Make sure your Syslog generator/sender's subnet is connected to Centralized Logging with OpenSearch' two private subnets. You need to use VPC Peering Connection or Transit Gateway to connect these VPCs.

2. The NLB together with the ECS containers in the architecture diagram will be provisioned only when you create a Syslog ingestion and be automated deleted when there is no Syslog ingestion.

Application log pipeline architecture for Syslog

  1. Syslog client (like Rsyslog) send logs to a Network Load Balancer (NLB) in Centralized Logging with OpenSearch's private subnets, and NLB routes to the ECS containers running Syslog servers.

  2. Fluent Bit works as the underlying log agent in the ECS Service to parse logs, and send them to an optional Log Buffer, or ingest into OpenSearch domain directly.

  3. The Log Buffer triggers the Lambda (Log Processor) to run.

  4. The log processor reads and processes the log records and ingests the logs into the OpenSearch domain.

  5. Logs that fail to be processed are exported to an Amazon S3 bucket (Backup Bucket).