Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

View Log Files

Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the master node in the /mnt/var/log/ directory. Depending on how you configured your cluster when you launched it, these logs may also be archived to Amazon S3 and may be viewable through the graphical debugging tool. For more information, see Configure Logging and Debugging (Optional).

There are many types of logs written to the master node. Amazon EMR writes step, bootstrap action, and instance state logs. Apache Hadoop writes logs to report the processing of jobs, tasks, and task attempts. Hadoop also records logs of its daemons. For more information about the logs written by Hadoop, go to http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html.

View Log Files on the Master Node

The following table lists some of the log files you'll find on the master node.

LocationDescription

/mnt/var/log/bootstrap-actions

Logs written during the processing of the bootstrap actions.

/mnt/var/log/hadoop-state-pusher

Logs written by the Hadoop state pusher process.

/mnt/var/log/instance-controller

Instance controller logs.

/mnt/var/log/instance-state

Instance state logs. These contain information about the CPU, memory state, and garbage collector threads of the node.

/mnt/var/log/service-nanny

Logs written by the service nanny process.

/mnt/var/log/hadoop

Hadoop logs, such as those written by the jobtracker and namenode processes.

/mnt/var/log/hadoop/steps/N

Step logs that contain information about the processing of the step. The value of N indicates the stepId assigned by Amazon EMR. For example, a cluster has two steps: s-1234ABCDEFGH and s-5678IJKLMNOP. The first step is located in /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/ and the second step in /mnt/var/log/hadoop/steps/s-5678IJKLMNOP/.

The step logs written by Amazon EMR are as follows.

  • controller — Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.

  • syslog — Describes the execution of Hadoop jobs in the step.

  • stderr — The standard error channel of Hadoop while it processes the step.

  • stdout — The standard output channel of Hadoop while it processes the step.

To view log files on the master node.

  1. Use SSH to connect to the master node as described in Connect to the Master Node Using SSH.

  2. Navigate to the directory that contains the log file information you wish to view. The preceding table gives a list of the types of log files that are available and where you will find them. The following example shows the command for navigating to the step log with an ID, s-1234ABCDEFGH.

    cd /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/
  3. Use a text editor installed on the master node to view the contents of the log file. There are several you can choose from: vi, nano, and emacs. The following example shows how to open the controller step log using the nano text editor.

    nano controller

View Log Files Archived to Amazon S3

Amazon EMR does not automatically archive log files to Amazon S3. You must configure this when you launch the cluster. For more information, see Configure Logging and Debugging (Optional).

When Amazon EMR is configured to archive log files to Amazon S3, it stores the files in the S3 location you specified, in the /JobFlowId/ folder, where JobFlowId is the cluster identifier.

The following table lists some of the log files you'll find on Amazon S3.

LocationDescription

/JobFlowId/daemons/

Logs written by Hadoop daemons, such as datanode and tasktracker. The logs for each node are stored in a folder labeled with the identifier of the EC2 instance of that node.

/JobFlowId/jobs/

Job logs and the configuration XML file for each Hadoop job.

/JobFlowId/node/

Node logs, including bootstrap action logs for the node. The logs for each node are stored in a folder labeled with the identifier of the EC2 instance of that node.

/JobFlowId/steps/N/

Step logs that contain information about the processing of the step. The value of N indicates the stepId assigned by Amazon EMR. For example, a cluster has two steps: s-1234ABCDEFGH and s-5678IJKLMNOP. The first step is located in /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/ and the second step in /mnt/var/log/hadoop/steps/s-5678IJKLMNOP/.

The step logs written by Amazon EMR are as follows.

  • controller — Information about the processing of the step. If your step fails while loading, you can find the stack trace in this log.

  • syslog — Describes the execution of Hadoop jobs in the step.

  • stderr — The standard error channel of Hadoop while it processes the step.

  • stdout — The standard output channel of Hadoop while it processes the step.

/JobFlowId/task-attempts/

Task attempt logs. The logs for each task attempt are stored in a folder labeled with the identifier of the corresponding job.

To view log files archived to Amazon S3 using the console

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. Open the S3 bucket you specified when you configured the cluster to archive log files in Amazon S3.

  3. Navigate to the log file containing the information to display. The preceding table gives a list of the types of log files that are available and where you will find them.

  4. Double-click on a log file to view it in the browser.

If you don't want to view the log files in the Amazon S3 console, you can download the files from Amazon S3 to your local machine using a tool such as the Amazon S3 Organizer plug-in for the Firefox web browser, or by writing an application to retrieve the objects from Amazon S3. For more information, see Getting Objects in the Amazon Simple Storage Service Developer Guide.

View Log Files in the Debugging Tool

Amazon EMR does not automatically enable the debugging tool. You must configure this when you launch the cluster. For more information, see Configure Logging and Debugging (Optional).

To view cluster logs using the console

  1. Open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.

  2. From the Cluster List page, click the details icon next to the cluster you want to view.

    This brings up the Cluster Details page. In the Steps section, the links to the right of each step display the various types of logs available for the step. These logs are generated by Amazon EMR.

  3. To view a list of the Hadoop jobs associated with a given step, click the View Jobs link to the right of the step.

  4. To view a list of the Hadoop tasks associated with a given job, click the View Tasks link to the right of the job.

  5. To view a list of the attempts a given task has run while trying to complete, click the View Attempts link to the right of the task.

  6. To view the logs generated by a task attempt, click the stderr, stdout, and syslog links to the right of the task attempt.

The debugging tool displays links to the log files after Amazon EMR uploads the log files to your bucket on Amazon S3. Because log files are uploaded to Amazon S3 every 5 minutes, it can take a few minutes for the log file uploads to complete after the step completes.

Amazon EMR periodically updates the status of Hadoop jobs, tasks, and task attempts in the debugging tool. You can click Refresh List in the debugging panes to get the most up-to-date status of these items.