Analyze bucket access logs with Amazon Athena in Lightsail
In this guide, we show you how to identify requests to a bucket using access logs. For more information, see Bucket access logs.
Contents
Query access logs for requests using Amazon Athena
You can use Amazon Athena to query and identify requests to a bucket in access logs.
Lightsail stores access logs as objects in a Lightsail bucket. It is often easier to use a tool that can analyze the logs. Athena supports analysis of objects and can be used to query access logs.
Example
The following example shows how you can query bucket server access logs in Amazon Athena.
Note
To specify a bucket location in an Athena query, you need to format the target bucket
name and target prefix where your logs are delivered as an S3 URI, as follows:
s3://
amzn-s3-demo-bucket1
-logs/prefix/
-
Open the Athena console at https://console.aws.amazon.com/athena/
. -
In the Query Editor, run a command similar to the following.
create database bucket_access_logs_db
Note
It's a best practice to create the database in the same AWS Region as your S3 bucket.
-
In the Query Editor, run a command similar to the following to create a table schema in the database that you created in step 2. The
STRING
andBIGINT
data type values are the access log properties. You can query these properties in Athena. ForLOCATION
, enter the bucket and prefix path as noted earlier.CREATE EXTERNAL TABLE `s3_access_logs_db.amzn-s3-demo-bucket_logs`( `bucketowner` STRING, `bucket_name` STRING, `requestdatetime` STRING, `remoteip` STRING, `requester` STRING, `requestid` STRING, `operation` STRING, `key` STRING, `request_uri` STRING, `httpstatus` STRING, `errorcode` STRING, `bytessent` BIGINT, `objectsize` BIGINT, `totaltime` STRING, `turnaroundtime` STRING, `referrer` STRING, `useragent` STRING, `versionid` STRING, `hostid` STRING, `sigv` STRING, `ciphersuite` STRING, `authtype` STRING, `endpoint` STRING, `tlsversion` STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '
s3://amzn-s3-demo-bucket1-logs/prefix/
' -
In the navigation pane, under Database, choose your database.
-
Under Tables, choose Preview table next to your table name.
In the Results pane, you should see data from the server access logs, such as
bucketowner
,bucket
,requestdatetime
, and so on. This means that you successfully created the Athena table. You can now query the bucket server access logs.
Example — Show who deleted an object and when (timestamp, IP address, and IAM user)
SELECT RequestDateTime, RemoteIP, Requester, Key FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE key = 'images/picture.jpg' AND operation like '%DELETE%';
Example — Show all operations that were performed by an IAM user
SELECT * FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE requester='arn:aws:iam::123456789123:user/user_name';
Example — Show all operations that were performed on an object in a specific time period
SELECT * FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE Key='prefix/images/picture.jpg' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2017-02-18:07:00:00','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2017-02-18:08:00:00','yyyy-MM-dd:HH:mm:ss');
Example — Show how much data was transferred by a specific IP address in a specific time period
SELECT SUM(bytessent) AS uploadTotal, SUM(objectsize) AS downloadTotal, SUM(bytessent + objectsize) AS Total FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE RemoteIP='1.2.3.4' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2017-06-01','yyyy-MM-dd') AND parse_datetime('2017-07-01','yyyy-MM-dd');
Identify object access requests using Amazon S3 access logs
You can use queries on access logs to identify object access requests, for operations such as GET, PUT, and DELETE, and discover further information about those requests.
The following Amazon Athena query example shows how to get all PUT
object
requests for a bucket from the server access log.
Example — Show all requesters that are sending PUT object requests in a certain period
SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime FROM s3_access_logs_db WHERE Operation='REST.PUT.OBJECT' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
The following Amazon Athena query example shows how to get all GET object requests for Amazon S3 from the server access log.
Example — Show all requesters that are sending GET object requests in a certain period
SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime FROM s3_access_logs_db WHERE Operation='REST.GET.OBJECT' AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
The following Amazon Athena query example shows how to get all anonymous requests to your S3 buckets from the server access log.
Example — Show all anonymous requesters that are making requests to a bucket in a certain period
SELECT Bucket, Requester, RemoteIP, Key, HTTPStatus, ErrorCode, RequestDateTime FROM s3_access_logs_db.amzn-s3-demo-bucket_logs WHERE Requester IS NULL AND parse_datetime(RequestDateTime,'dd/MMM/yyyy:HH:mm:ss Z') BETWEEN parse_datetime('2019-07-01:00:42:42','yyyy-MM-dd:HH:mm:ss') AND parse_datetime('2019-07-02:00:42:42','yyyy-MM-dd:HH:mm:ss')
Note
-
You can modify the date range to suit your needs.
-
These query examples might also be useful for security monitoring. You can review the results for
PutObject
orGetObject
calls from unexpected or unauthorized IP addresses/requesters and for identifying any anonymous requests to your buckets. -
This query only retrieves information from the time at which logging was enabled.