Prepare data using Amazon EMR
Amazon SageMaker Studio Classic comes with built-in integration of Amazon EMR, with which data
scientists and data engineers can perform petabyte-scale interactive data preparation and
machine learning (ML) right from their Studio Classic notebook. Within a notebook, they can
discover and connect to existing Amazon EMR clusters, then interactively explore, visualize, and
prepare large-scale data for machine learning using Apache Spark
Administrators can use the AWS Service Catalog to define AWS CloudFormation templates of Amazon EMR clusters accessible to Studio Classic users. Data scientists can then choose a predefined template to self-provision an Amazon EMR cluster directly from Amazon SageMaker Studio Classic notebooks. Administrators can further parameterize the templates to let users choose aspects of the cluster to match their workloads within predefined values. For example, a data scientist or data engineer may want to specify the number of core nodes of the cluster up to a predetermined maximum value, or select the instance type of a node from a dropdown menu.
-
If you are an administrator, make sure that you have enabled communication between Amazon SageMaker Studio Classic notebooks and Amazon EMR clusters. For instructions, see the Configure networking (for administrators) section. Once this communication is enabled, you have the option to:
-
Define cluster templates in AWS Service Catalog and ensure the availability of these templates through Studio Classic's notebooks: Configure Amazon EMR templates in AWS Service Catalog (for administrators).
-
Configure the discoverability of existing Amazon EMR clusters directly from Studio Classic's notebooks: Configure the discoverability of Amazon EMR clusters (for administrators).
-
-
If you are a data scientist or data engineer looking to self-provision an Amazon EMR cluster, see Launch an Amazon EMR cluster from Studio Classic.
-
If you are a data scientist or data engineer looking to discover and connect to existing Amazon EMR clusters from Studio Classic, see Use Amazon EMR clusters from Studio Classic notebooks.
List of topics
- Configure networking (for administrators)
- Create an Amazon EMR cluster from Studio Classic notebooks
- Use Amazon EMR clusters from Studio Classic notebooks
- Access Spark UI from Studio Classic
- Walkthroughs and whitepapers
- Additional Configuration for cross accounts use cases (for administrators)
- Troubleshooting