AWS Glue
Developer Guide

Logging AWS Glue Operations Using AWS CloudTrail

AWS Glue is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, a role, or an AWS service in AWS Glue. If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket, Amazon CloudWatch Logs, and Amazon CloudWatch Events. Using the information collected by CloudTrail, you can determine the request that was made to AWS Glue, the IP address from which the request was made, who made the request, when it was made, and additional details.

To learn more about CloudTrail, including how to configure and enable it, see the AWS CloudTrail User Guide.

AWS Glue Information in CloudTrail

Every event or log entry contains information about who generated the request. The identity information helps you determine the following:

  • Whether the request was made with root or IAM user credentials.

  • Whether the request was made with temporary security credentials for a role or federated user.

  • Whether the request was made by another AWS service.

For more information, see the CloudTrail userIdentity Element.

You can also create a trail and store your log files in one of your Amazon S3 buckets for as long as you want. A trail is a configuration that enables delivery of events as log files to an Amazon S3 bucket that you specify. You can then define Amazon S3 lifecycle rules to archive or delete log files automatically. By default, your log files are encrypted with Amazon S3 server-side encryption (SSE).

An event in a CloudTrail log file represents a single request from any source. It includes information about the requested action, the date and time of the action, request parameters, and so on. These events don't necessarily appear in the order the requests were made, or in any particular order at all.

All AWS Glue actions are logged by CloudTrail. For example, calls to CreateDatabase, CreateTable, and CreateScript all generate entries in the CloudTrail log files.

However, CloudTrail doesn't log all information regarding calls. For example, it doesn't log certain sensitive information, such as the ConnectionProperties used in connection requests, and it logs a null instead of the responses returned by the following APIs:

BatchGetPartition GetCrawlers GetJobs GetTable CreateScript GetCrawlerMetrics GetJobRun GetTables GetCatalogImportStatus GetDatabase GetJobRuns GetTableVersions GetClassifier GetDatabases GetMapping GetTrigger GetClassifiers GetDataflowGraph GetObjects GetTriggers GetConnection GetDevEndpoint GetPartition GetUserDefinedFunction GetConnections GetDevEndpoints GetPartitions GetUserDefinedFunctions GetCrawler GetJob GetPlan

To be notified of log file delivery, configure CloudTrail to publish Amazon SNS notifications when new log files are delivered. For more information, see Configuring Amazon SNS Notifications for CloudTrail.

You can also aggregate AWS Glue log files from multiple AWS Regions and multiple AWS accounts into a single Amazon S3 bucket. For more information, see Receiving CloudTrail Log Files from Multiple Regions and Receiving CloudTrail Log Files from Multiple Accounts.

CloudTrail Log File Entries for AWS Glue

The following example shows the kind of CloudTrail log entry that a DeleteCrawler call generates:

{ "eventVersion": "1.05", "userIdentity": { "type": "IAMUser", "principalId": "AKIAIOSFODNN7EXAMPLE", "arn": "arn:aws:iam::123456789012:user/johndoe", "accountId": "123456789012", "accessKeyId": "AKIAIOSFODNN7EXAMPLE", "userName": "johndoe" }, "eventTime": "2017-10-11T22:29:49Z", "eventSource": "", "eventName": "DeleteCrawler", "awsRegion": "us-east-1", "sourceIPAddress": "", "userAgent": "aws-cli/1.11.148 Python/3.6.1 Darwin/16.7.0 botocore/1.7.6", "requestParameters": { "name": "tes-alpha" }, "responseElements": null, "requestID": "b16f4050-aed3-11e7-b0b3-75564a46954f", "eventID": "e73dd117-cfd1-47d1-9e2f-d1271cad838c", "eventType": "AwsApiCall", "recipientAccountId": "123456789012" }

This example shows the kind of CloudTrail log entry that a CreateConnection call generates:

{ "eventVersion": "1.05", "userIdentity": { "type": "IAMUser", "principalId": "AKIAIOSFODNN7EXAMPLE", "arn": "arn:aws:iam::123456789012:user/johndoe", "accountId": "123456789012", "accessKeyId": "AKIAIOSFODNN7EXAMPLE", "userName": "johndoe" }, "eventTime": "2017-10-13T00:19:19Z", "eventSource": "", "eventName": "CreateConnection", "awsRegion": "us-east-1", "sourceIPAddress": "", "userAgent": "aws-cli/1.11.148 Python/3.6.1 Darwin/16.7.0 botocore/1.7.6", "requestParameters": { "connectionInput": { "name": "test-connection-alpha", "connectionType": "JDBC", "physicalConnectionRequirements": { "subnetId": "subnet-323232", "availabilityZone": "us-east-1a", "securityGroupIdList": [ "sg-12121212" ] } } }, "responseElements": null, "requestID": "27136ebc-afac-11e7-a7d6-ab217e5c3f19", "eventID": "e8b3baeb-c511-4597-880f-c16210c60a4a", "eventType": "AwsApiCall", "recipientAccountId": "123456789012" }