EMR Notebooks - Amazon EMR

EMR Notebooks

You can use Amazon EMR Notebooks along with Amazon EMR clusters running Apache Spark to create and open Jupyter Notebook and JupyterLab interfaces within the Amazon EMR console. An EMR notebook is a "serverless" notebook that you can use to run queries and code. Unlike a traditional notebook, the contents of an EMR notebook itself—the equations, queries, models, code, and narrative text within notebook cells—run in a client. The commands are executed using a kernel on the EMR cluster. Notebook contents are also saved to Amazon S3 separately from cluster data for durability and flexible re-use.

You can start a cluster, attach an EMR notebook for analysis, and then terminate the cluster. You can also close a notebook attached to one running cluster and switch to another. Multiple users can attach notebooks to the same cluster simultaneously and share notebook files in Amazon S3 with each other. These features let you run clusters on-demand to save cost, and reduce the time spent re-configuring notebooks for different clusters and datasets.

You can also execute an EMR notebook programmatically using the EMR API, without the need to interact with EMR console ("headless execution"). You need to include a cell in the EMR notebook that has a parameters tag. That cell allows a script to pass new input values to the notebook. Parameterized notebooks can be re-used with different sets of input values. There's no need to make copies of the same notebook to edit and execute with new input values. EMR creates and saves the output notebook on S3 for each run of the parameterized notebook. For EMR notebook API code samples, see Sample commands to execute EMR Notebooks programmatically.

Important

EMR Notebooks is supported with clusters created using Amazon EMR 5.18.0 and later. We strongly recommend that you use EMR Notebooks with clusters created using the latest version of Amazon EMR–particularly Amazon EMR release versions 5.30.0, 5.32.0 and later, or 6.2.0 and later. With these versions, a change was made so that Jupyter kernels run on the attached cluster, rather than on a Jupyter instance. This change helps improve performance and enhances your ability to customize kernels and libraries. For more information, see Differences in capabilities by cluster release version.

Applicable charges for Amazon S3 storage and for Amazon EMR clusters apply.