File could only be replicated to 0 nodes instead of 1 - Amazon EMR

File could only be replicated to 0 nodes instead of 1

When a file is written to HDFS, it is replicated to multiple core nodes. When you see this error, it means that the NameNode daemon does not have any available DataNode instances to write data to in HDFS. In other words, block replication is not taking place. This error can be caused by a number of issues:

  • The HDFS filesystem may have run out of space. This is the most likely cause.

  • DataNode instances may not have been available when the job was run.

  • DataNode instances may have been blocked from communication with the master node.

  • Instances in the core instance group might not be available.

  • Permissions may be missing. For example, the JobTracker daemon may not have permissions to create job tracker information.

  • The reserved space setting for a DataNode instance may be insufficient. Check whether this is the case by checking the dfs.datanode.du.reserved configuration setting.

To check whether this issue is caused by HDFS running out of disk space, look at the HDFSUtilization metric in CloudWatch. If this value is too high, you can add additional core nodes to the cluster. If you have a cluster that you think might run out of HDFS disk space, you can set an alarm in CloudWatch to alert you when the value of HDFSUtilization rises above a certain level. For more information, see Manually resizing a running cluster and Monitoring Amazon EMR metrics with CloudWatch.

If HDFS running out of space was not the issue, check the DataNode logs, the NameNode logs and network connectivity for other issues that could have prevented HDFS from replicating data. For more information, see View log files.