Create a flow log that publishes to Amazon S3 - Amazon VPC

Create a flow log that publishes to Amazon S3

Flow logs can publish flow log data to Amazon S3.

When publishing to Amazon S3, flow log data is published to an existing Amazon S3 bucket that you specify. Flow log records for all of the monitored transit gateways are published to a series of log file objects that are stored in the bucket.

Data ingestion and archival charges are applied by Amazon CloudWatch for vended logs when you publish flow logs to Amazon S3. For more information on CloudWatch pricing for vended logs, open Amazon CloudWatch Pricing, choose Logs, and then find Vended Logs.

To create an Amazon S3 bucket for use with flow logs, see Create a bucket in the Amazon Simple Storage Service User Guide.

For more information about multiple account logging, see Central Logging in the AWS Solutions Library.

For more information about CloudWatch Logs, see Logs sent to Amazon S3 in the Amazon CloudWatch Logs User Guide.

Flow log files

VPC Flow Logs is a feature that collects flow log records, consolidates them into log files, and then publishes the log files to the Amazon S3 bucket at 5-minute intervals. Each log file contains flow log records for the IP traffic recorded in the previous five minutes.

The maximum file size for a log file is 75 MB. If the log file reaches the file size limit within the 5-minute period, the flow log stops adding flow log records to it. Then it publishes the flow log to the Amazon S3 bucket, and creates a new log file.

In Amazon S3, the Last modified field for the flow log file indicates the date and time when the file was uploaded to the Amazon S3 bucket. This is later than the timestamp in the file name, and differs by the amount of time taken to upload the file to the Amazon S3 bucket.

Log file format

You can specify one of the following formats for the log files. Each file is compressed into a single Gzip file.

  • Text – Plain text. This is the default format.

  • Parquet – Apache Parquet is a columnar data format. Queries on data in Parquet format are 10 to 100 times faster compared to queries on data in plain text. Data in Parquet format with Gzip compression takes 20 percent less storage space than plain text with Gzip compression.

Log file options

You can optionally specify the following options.

  • Hive-compatible S3 prefixes – Enable Hive-compatible prefixes instead of importing partitions into your Hive-compatible tools. Before you run queries, use the MSCK REPAIR TABLE command.

  • Hourly partitions – If you have a large volume of logs and typically target queries to a specific hour, you can get faster results and save on query costs by partitioning logs on an hourly basis.

Log file S3 bucket structure

Log files are saved to the specified Amazon S3 bucket using a folder structure that is based on the flow log's ID, Region, creation date, and destination options.

By default, the files are delivered to the following location.

bucket-and-optional-prefix/AWSLogs/account_id/vpcflowlogs/region/year/month/day/

If you enable Hive-compatible S3 prefixes, the files are delivered to the following location.

bucket-and-optional-prefix/AWSLogs/aws-account-id=account_id/service=vpcflowlogs/aws-region=region/year=year/month=month/day=day/

If you enable hourly partitions, the files are delivered to the following location.

bucket-and-optional-prefix/AWSLogs/account_id/vpcflowlogs/region/year/month/day/hour/

If you enable Hive-compatible partitions and partition the flow log per hour, the files are delivered to the following location.

bucket-and-optional-prefix/AWSLogs/aws-account-id=account_id/service=vpcflowlogs/aws-region=region/year=year/month=month/day=day/hour=hour/
Log file names

The file name of a log file is based on the flow log ID, Region, and creation date and time. File names use the following format.

aws_account_id_vpcflowlogs_region_flow_log_id_YYYYMMDDTHHmmZ_hash.log.gz

The following is an example of a log file for a flow log created by AWS account 123456789012, for a resource in the us-east-1 Region, on June 20, 2018 at 16:20 UTC. The file contains the flow log records with an end time between 16:20:00 and 16:24:59.

123456789012_vpcflowlogs_us-east-1_fl-1234abcd_20180620T1620Z_fe123456.log.gz

IAM policy for IAM principals that publish flow logs to Amazon S3

The IAM principal that creates the flow log must have the following permissions, which are required to publish flow logs to the destination Amazon S3 bucket.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogDelivery", "logs:DeleteLogDelivery" ], "Resource": "*" } ] }

Amazon S3 bucket permissions for flow logs

By default, Amazon S3 buckets and the objects they contain are private. Only the bucket owner can access the bucket and the objects stored in it. However, the bucket owner can grant access to other resources and users by writing an access policy.

If the user creating the flow log owns the bucket and has PutBucketPolicy and GetBucketPolicy permissions for the bucket, we automatically attach the following policy to the bucket. This policy overwrites any existing policy attached to the bucket.

Otherwise, the bucket owner must add this policy to the bucket, specifying the AWS account ID of the flow log creator, or flow log creation fails. For more information, see Using bucket policies in the Amazon Simple Storage Service User Guide.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "AWSLogDeliveryWrite", "Effect": "Allow", "Principal": {"Service": "delivery.logs.amazonaws.com"}, "Action": "s3:PutObject", "Resource": "my-s3-arn", "Condition": { "StringEquals": { "s3:x-amz-acl": "bucket-owner-full-control", "aws:SourceAccount": account_id }, "ArnLike": { "aws:SourceArn": "arn:aws:logs:region:account_id:*" } } }, { "Sid": "AWSLogDeliveryCheck", "Effect": "Allow", "Principal": {"Service": "delivery.logs.amazonaws.com"}, "Action": ["s3:GetBucketAcl", "s3:ListBucket"], "Resource": "arn:aws:s3:::bucket_name", "Condition": { "StringEquals": { "aws:SourceAccount": account_id }, "ArnLike": { "aws:SourceArn": "arn:aws:logs:region:account_id:*" } } } ] }

The ARN that you specify for my-s3-arn depends on whether you use Hive-compatible S3 prefixes.

  • Default prefixes

    arn:aws:s3:::bucket_name/optional_folder/AWSLogs/account_id/*
  • Hive-compatible S3 prefixes

    arn:aws:s3:::bucket_name/optional_folder/AWSLogs/aws-account-id=account_id/*

As a best practice, we recommend that you grant these permissions to the log delivery service principal instead of individual AWS account ARNs. It is also a best practice to use the aws:SourceAccount and aws:SourceArn condition keys to protect against the confused deputy problem. The source account is the owner of the flow log and the source ARN is the wildcard (*) ARN of the logs service.

Required key policy for use with SSE-KMS

You can protect the data in your Amazon S3 bucket by enabling either Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3) or Server-Side Encryption with KMS Keys (SSE-KMS). For more information, see Protecting data using server-side encryption in the Amazon S3 User Guide.

With SSE-KMS, you can use either an AWS managed key or a customer managed key. With an AWS managed key, you can't use cross-account delivery. Flow logs are delivered from the log delivery account, so you must grant access for cross-account delivery. To grant cross-account access to your S3 bucket, use a customer managed key and specify the Amazon Resource Name (ARN) of the customer managed key when you enable bucket encryption. For more information, see Specifying server-side encryption with AWS KMS in the Amazon S3 User Guide.

When you use SSE-KMS with a customer managed key, you must add the following to the key policy for your key (not the bucket policy for your S3 bucket), so that VPC Flow Logs can write to your S3 bucket.

{ "Sid": "Allow Transit Gateway Flow Logs to use the key", "Effect": "Allow", "Principal": { "Service": [ "delivery.logs.amazonaws.com" ] }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }

Amazon S3 log file permissions

In addition to the required bucket policies, Amazon S3 uses access control lists (ACLs) to manage access to the log files created by a flow log. By default, the bucket owner has FULL_CONTROL permissions on each log file. The log delivery owner, if different from the bucket owner, has no permissions. The log delivery account has READ and WRITE permissions. For more information, see Access Control List (ACL) Overview in the Amazon Simple Storage Service User Guide.

Create a flow log that publishes to Amazon S3

After you have created and configured your Amazon S3 bucket, you can create flow logs for transit gateways.

To create a transit gateway flow log that publishes to Amazon S3 using the console
  1. Open the Amazon VPC console at https://console.aws.amazon.com/vpc/.

  2. In the navigation pane, choose Transit gateways or Transit gateway attachments.

  3. Select the checkboxes for one or more transit gateways or transit gateway attachments.

  4. Choose Actions, Create flow log.

  5. Configure the flow log settings. For more information, see To configure flow log settings.

To configure flow log settings using the console
  1. For Destination, choose Send to an S3 bucket.

  2. For S3 bucket ARN, specify the Amazon Resource Name (ARN) of an existing Amazon S3 bucket. You can optionally include a subfolder. For example, to specify a subfolder named my-logs in a bucket named my-bucket, use the following ARN:

    arn:aws::s3:::my-bucket/my-logs/

    The bucket cannot use AWSLogs as a subfolder name, as this is a reserved term.

    If you own the bucket, we automatically create a resource policy and attach it to the bucket. For more information, see Amazon S3 bucket permissions for flow logs.

  3. For Log record format, specify the format for the flow log record.

    • To use the default flow log record format, choose AWS default format.

    • To create a custom format, choose Custom format. For Log format, choose the fields to include in the flow log record.

  4. For Log file format, specify the format for the log file.

    • Text – Plain text. This is the default format.

    • Parquet – Apache Parquet is a columnar data format. Queries on data in Parquet format are 10 to 100 times faster compared to queries on data in plain text. Data in Parquet format with Gzip compression takes 20 percent less storage space than plain text with Gzip compression.

  5. (Optional) To use Hive-compatible S3 prefixes, choose Hive-compatible S3 prefix, Enable.

  6. (Optional) To partition your flow logs per hour, choose Every 1 hour (60 mins).

  7. (Optional) To add a tag to the flow log, choose Add new tag and specify the tag key and value.

  8. Choose Create flow log.

To create a flow log that publishes to Amazon S3 using a command line tool

Use one of the following commands.

The following AWS CLI example creates a flow log that captures all transit gateway traffic for VPC tgw-00112233344556677 and delivers the flow logs to an Amazon S3 bucket called flow-log-bucket. The --log-format parameter specifies a custom format for the flow log records.

aws ec2 create-flow-logs --resource-type TransitGateway --resource-ids tgw-00112233344556677 --log-destination-type s3 --log-destination arn:aws:s3:::flow-log-bucket/my-custom-flow-logs/'

Process flow log records in Amazon S3

The log files are compressed. If you open the log files using the Amazon S3 console, they are decompressed and the flow log records are displayed. If you download the files, you must decompress them to view the flow log records.