Service log analytics pipeline - Centralized Logging with OpenSearch

Service log analytics pipeline

Centralized Logging with OpenSearch supports log analysis for AWS services, such as Amazon S3 Access Logs, and Application Load Balancer access logs. For a complete list of supported AWS services, refer to Supported AWS Services.

This solution ingests different AWS service logs using different workflows.

Note

Centralized Logging with OpenSearch supports cross-account log ingestion. If you want to ingest logs from the same account, the resources in the Sources group will be in the same account as your Centralized Logging with OpenSearch account. Otherwise, they will be in another AWS account.

Logs through Amazon S3

Many AWS services support delivering logs to Amazon S3 directly, or through other services. The workflow supports three scenarios:

Scenario 1: Logs to Amazon S3 directly (OpenSearch Engine)

In this scenario, the service directly delivers logs to Amazon S3. This architecture is applicable to the following log sources:

  • AWS CloudTrail logs (delivers to Amazon S3)

  • Application Load Balancer access logs

  • AWS WAF logs

  • Amazon CloudFront standard logs

  • Amazon S3 Access Logs

  • AWS Config logs

  • VPC Flow Logs (delivers to Amazon S3)

Amazon S3 based service log pipeline architecture.

Amazon S3 based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS services are configured to deliver logs to Amazon S3 bucket (Log Bucket).

  2. An event notification is sent to Amazon SQS using S3 Event Notifications when a new log file is created.

  3. Amazon SQS initiates the Log Processor Lambda to run.

  4. The Log Processor reads and processes the log files.

  5. The Log Processor ingests the logs into the Amazon OpenSearch Service.

  6. Logs that fail to be processed are exported to Amazon S3 bucket (Backup Bucket).

Scenario 2: Logs to Amazon S3 via Firehose (OpenSearch Engine)

In this scenario, the service cannot deliver their logs to Amazon S3 directly. The logs are sent to Amazon CloudWatch, and Amazon Data Firehose is used to subscribe the logs from CloudWatch Log Group and then redeliver the logs to Amazon S3. This architecture is applicable to the following log sources:

  • Amazon RDS/Aurora logs

  • AWS Lambda logs

ADD ALTERNATE TEXT HERE for people using assistive technology.

Amazon S3 (via Firehose) based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS services logs are configured to deliver logs to Amazon CloudWatch Logs, and Amazon Data Firehose are used to subscribe and store logs in Amazon S3 bucket (Log Bucket).

  2. An event notification is sent to Amazon SQS using S3 Event Notifications when a new log file is created.

  3. Amazon SQS initiates the Log Processor Lambda to run.

  4. The Log Processor reads and processes the log files.

  5. The Log Processor ingests the logs into the Amazon OpenSearch Service.

  6. Logs that fail to be processed are exported to Amazon S3 bucket (Backup Bucket).

Scenario 3: Logs to Amazon S3 directly (Light Engine)

In this scenario, the service directly sends logs to Amazon S3. This architecture is applicable to the following log sources:

  • Amazon CloudFront standard logs

  • AWS CloudTrail logs (delivers to Amazon S3)

  • Application Load Balancer access logs

  • AWS WAF logs

  • VPC Flow Logs (delivers to Amazon S3)

Amazon S3 based service log pipeline architecture.

Amazon S3 based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS services are configured to deliver logs to the Amazon S3 bucket (Log Bucket).

  2. An event notification is sent to Amazon SQS using S3 Event Notifications when a new log file is created.

  3. Amazon SQS initiates AWS Lambda to run.

  4. AWS Lambda load the log file from the Log Bucket.

  5. AWS Lambda put the log file to the Staging Bucket.

  6. The Log Processor, AWS Step Functions, processes raw log files stored in the staging bucket in batches.

  7. The Log Processor converts raw log files to Apache Parquet format and automatically partitions all incoming data based on criteria including time and Region. The Log Processor calculates metrics and stored in a Centralized Bucket.

Logs through Amazon Kinesis Data Streams

Some AWS services support delivering logs to Amazon Kinesis Data Streams. The workflow supports two scenarios:

Scenario 1: Logs to Kinesis Data Streams directly (OpenSearch Engine)

In this scenario, the service directly delivers logs to Amazon Kinesis Data Streams. This architecture is applicable to the following log sources:

  • Amazon CloudFront real-time logs

Amazon Kinesis Data Streams based service log pipeline architecture.

Amazon Kinesis Data Streams based service log pipeline architecture

Warning

This solution does not support cross-account ingestion for CloudFront real-time logs.

The log pipeline runs the following workflow:

  1. AWS services are configured to deliver logs to Amazon Kinesis Data Streams.

  1. Amazon Kinesis Data Streams trigger AWS Lambda to execute.

  2. AWS Lambda read, parse, and process logs from Kinesis Data Streams, and upload to Amazon OpenSearch Service.

  3. Logs that fail to be processed are exported to Amazon S3 bucket (Backup Bucket).

Scenario 2: Logs to Kinesis Data Streams via CloudWatch Logs (OpenSearch Engine)

In this scenario, the service delivers the logs to CloudWatch Logs, and then CloudWatch Logs redeliver the logs in real-time to Kinesis Data Streams using subscription. This architecture is applicable to the following log sources:

  • AWS CloudTrail logs (delivers to CloudWatch Logs)

  • VPC Flow Logs (delivers to CloudWatch Logs)

Amazon Kinesis Data Streams (via CloudWatch Logs) based service log pipeline architecture.

Amazon Kinesis Data Streams (via CloudWatch Logs) based service log pipeline architecture

The log pipeline runs the following workflow:

  1. AWS Services logs are configured to write logs Amazon CloudWatch Logs.

  2. Logs are redelivered to Kinesis Data Streams via CloudWatch Logs subscription.

  3. Kinesis Data Streams initiates the AWS Lambda (Log Processor) to execute.

  4. The Log Processor processes and ingests the logs into the Amazon OpenSearch Service.

  5. Logs that fail to be processed are exported to Amazon S3 bucket (Backup Bucket).

For cross-account ingestion, the AWS Services store logs on Amazon CloudWatch log group in the member account, and other resources remain in the main account.