View Amazon EMR log files - Amazon EMR

View Amazon EMR log files

Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the primary node in the /mnt/var/log/ directory. Depending on how you configured your cluster when you launched it, these logs may also be archived to Amazon S3 and may be viewable through the graphical debugging tool.

There are many types of logs written to the primary node. Amazon EMR writes step, bootstrap action, and instance state logs. Apache Hadoop writes logs to report the processing of jobs, tasks, and task attempts. Hadoop also records logs of its daemons. For more information about the logs written by Hadoop, go to http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html.

View log files on the primary node

The following table lists some of the log files you'll find on the primary node.

Location Description

/emr/instance-controller/log/bootstrap-actions

Logs written during the processing of the bootstrap actions.

/mnt/var/log/hadoop-state-pusher

Logs written by the Hadoop state pusher process.

/emr/instance-controller/log

Instance controller logs.

/emr/instance-state

Instance state logs. These contain information about the CPU, memory state, and garbage collector threads of the node.

/emr/service-nanny

Logs written by the service nanny process.

/mnt/var/log/application

Logs specific to an application such as Hadoop, Spark, or Hive.

/mnt/var/log/hadoop/steps/N

Step logs that contain information about the processing of the step. The value of N indicates the stepId assigned by Amazon EMR. For example, a cluster has two steps: s-1234ABCDEFGH and s-5678IJKLMNOP. The first step is located in /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/ and the second step in /mnt/var/log/hadoop/steps/s-5678IJKLMNOP/.

The step logs written by Amazon EMR are as follows.

  • controller — Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.

  • syslog — Describes the execution of Hadoop jobs in the step.

  • stderr — The standard error channel of Hadoop while it processes the step.

  • stdout — The standard output channel of Hadoop while it processes the step.

To view log files on the primary node with the AWS CLI.
  1. Use SSH to connect to the primary node as described in Connect to the Amazon EMR cluster primary node using SSH.

  2. Navigate to the directory that contains the log file information you wish to view. The preceding table gives a list of the types of log files that are available and where you will find them. The following example shows the command for navigating to the step log with an ID, s-1234ABCDEFGH.

    cd /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/
  3. Use a file viewer of your choice to view the log file. The following example uses the Linux less command to view the controller log file.

    less controller

View log files archived to Amazon S3

By default, Amazon EMR clusters launched using the console automatically archive log files to Amazon S3. You can specify your own log path, or you can allow the console to automatically generate a log path for you. For clusters launched using the CLI or API, you must configure Amazon S3 log archiving manually.

When Amazon EMR is configured to archive log files to Amazon S3, it stores the files in the S3 location you specified, in the /cluster-id/ folder, where cluster-id is the cluster ID.

The following table lists some of the log files you'll find on Amazon S3.

Location Description

/cluster-id/node/

Node logs, including bootstrap action, instance state, and application logs for the node. The logs for each node are stored in a folder labeled with the identifier of the EC2 instance of that node.

/cluster-id/node/instance-id/application

The logs created by each application or daemon associated with an application. For example, the Hive server log is located at cluster-id/node/instance-id/hive/hive-server.log.

/cluster-id/steps/step-id/

Step logs that contain information about the processing of the step. The value of step-id indicates the step ID assigned by Amazon EMR. For example, a cluster has two steps: s-1234ABCDEFGH and s-5678IJKLMNOP. The first step is located in /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/ and the second step in /mnt/var/log/hadoop/steps/s-5678IJKLMNOP/.

The step logs written by Amazon EMR are as follows.

  • controller — Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.

  • syslog — Describes the execution of Hadoop jobs in the step.

  • stderr — The standard error channel of Hadoop while it processes the step.

  • stdout — The standard output channel of Hadoop while it processes the step.

/cluster-id/containers

Application container logs. The logs for each YARN application are stored in these locations.

/cluster-id/hadoop-mapreduce/

The logs that contain information about configuration details and job history of MapReduce jobs.

To view log files archived to Amazon S3 with the Amazon S3 console
  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. Open the S3 bucket specified when you configured the cluster to archive log files in Amazon S3.

  3. Navigate to the log file containing the information to display. The preceding table gives a list of the types of log files that are available and where you will find them.

  4. Download the log file object to view it. For instructions, see Downloading an object.