Hadoop and Spark metrics in Ganglia
Note
The last release of Amazon EMR to include Ganglia was Amazon EMR 6.15.0. To monitor your cluster, releases higher than 6.15.0 include the Amazon CloudWatch agent.
Ganglia reports Hadoop metrics for each instance. The various types of metrics are prefixed by category: distributed file system (dfs.*), Java virtual machine (jvm.*), MapReduce (mapred.*), and remote procedure calls (rpc.*).
YARN-based Ganglia metrics such as Spark and Hadoop are not available for EMR release versions 4.4.0 and 4.5.0. Use a later version to use these metrics.
Ganglia metrics for Spark generally have prefixes for YARN application ID and Spark DAGScheduler. So prefixes follow this form:
-
DAGScheduler.*
-
application_xxxxxxxxxx_xxxx.driver.*
-
application_xxxxxxxxxx_xxxx.executor.*