Menu
Amazon EMR
Amazon EMR Release Guide

What's New?

This topic covers features and issues resolved in the current release of Amazon EMR. These release notes are also available on the Release 5.14.0 Tab, along with the application versions, component versions, and available configuration classifications for this release.

For earlier-version release notes back to release version 4.2.0, see Amazon EMR What's New History.

Release 5.14.0 (Latest)

The following release notes include information for Amazon EMR release version 5.14.0. Changes are relative to 5.13.0.

Initial release date: June 4, 2018

Upgrades

  • Upgraded Apache Flink to 1.4.2

  • Upgraded Apache MXnet to 1.1.0

  • Upgraded Apache Sqoop to 1.4.7

New Features

  • Added JupyterHub support. For more information, see JupyterHub.

Changes, Enhancements, and Resolved Issues

  • EMRFS

    • The userAgent string in requests to Amazon S3 has been updated to contain the user and group information of the invoking principal. This can be used with AWS CloudTrail logs for more comprehensive request tracking.

  • HBase

    • Included HBASE-20447, which addresses an issue that could cause cache issues, especially with split regions.

  • MXnet

    • Added OpenCV libraries.

  • Spark

    • When Spark writes Parquet files to an Amazon S3 location using EMRFS, the FileOutputCommitter algorithm has been updated to use version 2 instead of version 1. This reduces the number of renames, which improves application performance. This change does not affect:

      • Applications other than Spark.

      • Applications that write to other file systems, such as HDFS (which still use version 1 of FileOutputCommitter).

      • Applications that use other output formats, such as text or csv, that already use EMRFS direct write.

Known Issues

  • JupyterHub

    • Using configuration classifications to set up JupyterHub and individual Jupyter notebooks when you create a cluster is not supported. Edit the jupyterhub_config.py file and jupyter_notebook_config.py files for each user manually. For more information, see Configuring JupyterHub.

    • JupyterHub fails to start on clusters within a private subnet, failing with the message Error: ENOENT: no such file or directory, open '/etc/jupyter/conf/server.crt' . This is caused by an error in the script that generates self-signed certificates. Use the following workaround to generate self-signed certificates. All commands are executed while connected to the master node.

      1. Copy the certificate generation script from the container to the master node:

        sudo docker cp jupyterhub:/tmp/gen_self_signed_cert.sh ./
      2. Use a text editor to change line 23 to change public hostname to local hostname as shown below:

        local hostname=$(curl -s $EC2_METADATA_SERVICE_URI/local-hostname)
      3. Run the script to generate self-signed certificates:

        sudo bash ./gen_self_signed_cert.sh
      4. Move the certificate files that the script generates to the /etc/jupyter/conf/ directory:

        sudo mv /tmp/server.crt /tmp/server.key /etc/jupyter/conf/

      You can tail the jupyter.log file to verify that JupyterHub restarted and is returning a 200 response code. For example:

      tail -f /var/log/jupyter/jupyter.log

      This should return a response similar to the following:

      # [I 2018-06-14 18:56:51.356 JupyterHub app:1581] JupyterHub is now running at https://:9443/ # 19:01:51.359 - info: [ConfigProxy] 200 GET /api/routes

On this page: