View log files - Amazon EMR

View log files

Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the primary node in the /mnt/var/log/ directory. Depending on how you configured your cluster when you launched it, these logs may also be archived to Amazon S3 and may be viewable through the graphical debugging tool.

There are many types of logs written to the primary node. Amazon EMR writes step, bootstrap action, and instance state logs. Apache Hadoop writes logs to report the processing of jobs, tasks, and task attempts. Hadoop also records logs of its daemons. For more information about the logs written by Hadoop, go to http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html.

View log files on the primary node

The following table lists some of the log files you'll find on the primary node.

Location Description

/emr/instance-controller/log/bootstrap-actions

Logs written during the processing of the bootstrap actions.

/mnt/var/log/hadoop-state-pusher

Logs written by the Hadoop state pusher process.

/emr/instance-controller/log

Instance controller logs.

/emr/instance-state

Instance state logs. These contain information about the CPU, memory state, and garbage collector threads of the node.

/emr/service-nanny

Logs written by the service nanny process.

/mnt/var/log/application

Logs specific to an application such as Hadoop, Spark, or Hive.

/mnt/var/log/hadoop/steps/N

Step logs that contain information about the processing of the step. The value of N indicates the stepId assigned by Amazon EMR. For example, a cluster has two steps: s-1234ABCDEFGH and s-5678IJKLMNOP. The first step is located in /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/ and the second step in /mnt/var/log/hadoop/steps/s-5678IJKLMNOP/.

The step logs written by Amazon EMR are as follows.

  • controller — Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.

  • syslog — Describes the execution of Hadoop jobs in the step.

  • stderr — The standard error channel of Hadoop while it processes the step.

  • stdout — The standard output channel of Hadoop while it processes the step.

To view log files on the primary node with the AWS CLI.
  1. Use SSH to connect to the primary node as described in Connect to the primary node using SSH.

  2. Navigate to the directory that contains the log file information you wish to view. The preceding table gives a list of the types of log files that are available and where you will find them. The following example shows the command for navigating to the step log with an ID, s-1234ABCDEFGH.

    cd /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/
  3. Use a file viewer of your choice to view the log file. The following example uses the Linux less command to view the controller log file.

    less controller

View log files archived to Amazon S3

By default, Amazon EMR clusters launched using the console automatically archive log files to Amazon S3. You can specify your own log path, or you can allow the console to automatically generate a log path for you. For clusters launched using the CLI or API, you must configure Amazon S3 log archiving manually.

When Amazon EMR is configured to archive log files to Amazon S3, it stores the files in the S3 location you specified, in the /cluster-id/ folder, where cluster-id is the cluster ID.

The following table lists some of the log files you'll find on Amazon S3.

Location Description

/cluster-id/node/

Node logs, including bootstrap action, instance state, and application logs for the node. The logs for each node are stored in a folder labeled with the identifier of the EC2 instance of that node.

/cluster-id/node/instance-id/application

The logs created by each application or daemon associated with an application. For example, the Hive server log is located at cluster-id/node/instance-id/hive/hive-server.log.

/cluster-id/steps/step-id/

Step logs that contain information about the processing of the step. The value of step-id indicates the step ID assigned by Amazon EMR. For example, a cluster has two steps: s-1234ABCDEFGH and s-5678IJKLMNOP. The first step is located in /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/ and the second step in /mnt/var/log/hadoop/steps/s-5678IJKLMNOP/.

The step logs written by Amazon EMR are as follows.

  • controller — Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.

  • syslog — Describes the execution of Hadoop jobs in the step.

  • stderr — The standard error channel of Hadoop while it processes the step.

  • stdout — The standard output channel of Hadoop while it processes the step.

/cluster-id/containers

Application container logs. The logs for each YARN application are stored in these locations.

/cluster-id/hadoop-mapreduce/

The logs that contain information about configuration details and job history of MapReduce jobs.

To view log files archived to Amazon S3 with the Amazon S3 console
  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. Open the S3 bucket specified when you configured the cluster to archive log files in Amazon S3.

  3. Navigate to the log file containing the information to display. The preceding table gives a list of the types of log files that are available and where you will find them.

  4. Download the log file object to view it. For instructions, see Downloading an object.

View log files in the debugging tool

Amazon EMR doesn't automatically enable the debugging tool. You must configure this when you launch the cluster. Note that the new Amazon EMR console doesn't offer the debugging tool.

To view cluster logs with the old console
  1. Navigate to the new Amazon EMR console and select Switch to the old console from the side navigation. For more information on what to expect when you switch to the old console, see Using the old console.

  2. From the Cluster List page, choose the details icon next to the cluster you want to view.

    This brings up the Cluster Details page. In the Steps section, the links to the right of each step display the various types of logs available for the step. These logs are generated by Amazon EMR.

  3. To view a list of the Hadoop jobs associated with a given step, choose the View Jobs link to the right of the step.

  4. To view a list of the Hadoop tasks associated with a given job, choose the View Tasks link to the right of the job.

  5. To view a list of the attempts a given task has run while trying to complete, choose the View Attempts link to the right of the task.

  6. To view the logs generated by a task attempt, choose the stderr, stdout, and syslog links to the right of the task attempt.

The debugging tool displays links to the log files after Amazon EMR uploads the log files to your bucket on Amazon S3. Because log files are uploaded to Amazon S3 every 5 minutes, it can take a few minutes for the log file uploads to complete after the step completes.

Amazon EMR periodically updates the status of Hadoop jobs, tasks, and task attempts in the debugging tool. You can click Refresh List in the debugging panes to get the most up-to-date status of these items.