Monitoring jobs
You can use Amazon CloudWatch Events to track jobs that run on an Amazon EMR on EKS virtual cluster. You can use events to track the activity and health of a jobs that you run on a virtual cluster. The topics that follow show you ways to configure monitoring effectively to maintain the health of your resources.
Topics
Monitor jobs with Amazon CloudWatch Events
Amazon EMR on EKS emits events when the state of a job run changes. Each event provides information, such as the date and time when the event occurred, along with further details about the event, such as the virtual cluster ID and the ID of the job run that was affected.
You can use events to track the activity and health of a jobs that you run on a virtual
cluster. You can also use Amazon CloudWatch Events to define an action to take when a job run generates an event
that matches a pattern that you specify. Events are useful for monitoring a specific occurrence
during the lifecycle of a job run. For example, you can monitor when a job run changes state from
submitted
to running
. For more information about CloudWatch Events, see the
Amazon EventBridge User
Guide.
The following table lists Amazon EMR on EKS events along with the state or state change that the event indicates, the severity of the event, and event messages. Each event is represented as a JSON object that is sent automatically to an event stream. The JSON object includes further details about the event. The JSON object is particularly important when you set up rules for event processing using CloudWatch Events because rules seek to match patterns in the JSON object. For more information, see Amazon EventBridge event patterns and Amazon EMR on EKS Events in the Amazon EventBridge User Guide.
Job run state change events | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
State | Severity | Message | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SUBMITTED | INFO | Job Run JobRunId (JobRunName ) was
successfully submitted to virtual cluster VirtualClusterId at
Time UTC. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
RUNNING | INFO | Job Run JobRunId (JobRunName ) in
virtual cluster VirtualClusterId started running at
Time . |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
COMPLETED | INFO | Job Run jobRunId (JobRunName ) in
virtual cluster VirtualClusterId completed at
Time . The Job Run started running at
Time and took Num minutes to
complete. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CANCELLED | WARN | Cancellation request has succeeded for Job Run JobRunId
(JobRunName ) in virtual cluster
VirtualClusterId at Time and the Job
Run is now cancelled. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
FAILED | ERROR | Job Run JobRunId (JobRunName ) in
virtual cluster VirtualClusterId failed at
Time . |
Automate Amazon EMR on EKS with CloudWatch Events
You can use Amazon CloudWatch Events to automate your AWS services to respond to system events such as application availability issues or resource changes. Events from AWS services are delivered to CloudWatch Events in near real time. You can write simple rules to indicate which events are of interest to you and what automated actions to take when an event matches a rule. The actions that can be automatically triggered include the following:
-
Invoking an AWS Lambda function
-
Invoking Amazon EC2 Run Command
-
Relaying the event to Amazon Kinesis Data Streams
-
Activating an AWS Step Functions state machine
-
Notifying an Amazon Simple Notification Service (SNS) topic or an Amazon Simple Queue Service (SQS) queue
Some examples of using CloudWatch Events with Amazon EMR on EKS include the following:
-
Activating a Lambda function when a job run succeeds
-
Notifying an Amazon SNS topic when a job run fails
CloudWatch Events for "detail-type:
" "EMR Job Run State Change
"
are generated by Amazon EMR on EKS for SUBMITTED
, RUNNING
,
CANCELLED
, FAILED
and COMPLETED
state changes.
Example: Set up a rule that invokes Lambda
Use the following steps to set up a CloudWatch Events rule that invokes Lambda when there is an "EMR Job Run State Change" event.
aws events put-rule \ --name cwe-test \ --event-pattern '{"detail-type": ["EMR Job Run State Change"]}'
Add the Lambda function that you own as a new target and give CloudWatch Events permission to invoke the
Lambda function as follows. Replace 123456789012
with your account ID.
aws events put-targets \ --rule cwe-test \ --targets Id=1,Arn=arn:aws:lambda:us-east-1:
123456789012
:function:MyFunction
aws lambda add-permission \ --function-name MyFunction \ --statement-id MyId \ --action 'lambda:InvokeFunction' \ --principal events.amazonaws.com
Note
You cannot write a program that depends on the order or existence of notification events, as they might be out of sequence or missing. Events are emitted on a best effort basis.
Monitor job’s driver pod with a retry policy using Amazon CloudWatch Events
Using CloudWatch events, you can monitor driver pods that have been created in jobs that have retry policies. For more information, see Monitoring a job with a retry policy in this guide.