Create an EMR Studio
You can create an EMR Studio for your team with the Amazon EMR console or the AWS CLI.
Creating a Studio instance is part of setting up Amazon EMR Studio.
Prerequisites
Before you create a Studio, make sure you've completed the previous tasks in
Set up an EMR Studio.
To create a Studio using the AWS CLI, you should have the latest version
installed. For more information, see Installing or updating the latest
version of the AWS CLI.
Deactivate proxy management tools such as FoxyProxy or SwitchyOmega in the browser
before you create a Studio. Active proxies can result in a Network Failure
error message when you choose Create Studio.
Amazon EMR provides you with a simple console experience to create a Studio, so you can quickly get started with the default settings.
to run interactive workloads or batch jobs with the default settings. Creating a EMR Studio also creates an
EMR Serverless application ready for your interactive jobs.
If you want full control
over your Studio's settings, you can choose Custom, which lets you configure
all of the additional settings.
- Interactive workloads
-
To create a EMR Studio for interactive workloads
-
Open the Amazon EMR console at https://console.aws.amazon.com/emr.
-
Under EMR Studio on the left navigation, choose
Getting started. You can also create a new Studio
from the Studios page.
-
Amazon EMR provides default settings for you if you're creating a EMR Studio for interactive workloads, but you can edit these settings.
Configurable settings include the EMR Studio's name, the S3 location for your Workspace, the service role to use,
the Workspace(s) you want to use, EMR Serverless application name, and the associated runtime role.
-
Choose Create Studio and launch Workspace to finish and navigate to
the Studios page. Your new Studio appears in
the list with details such as Studio name,
Creation date, and Studio access
URL. Your Workspace opens in a new tab in your browser.
- Batch jobs
-
To create a EMR Studio for interactive workloads
-
Open the Amazon EMR console at https://console.aws.amazon.com/emr.
-
Under EMR Studio on the left navigation, choose
Getting started. You can also create a new Studio
from the Studios page.
-
Amazon EMR provides default settings for you if you're creating a EMR Studio for batch jobs, but you can edit these settings.
Configurable settings include the EMR Studio's name, EMR Serverless application name, and the associated runtime role.
-
Choose Create Studio and launch Workspace to finish and navigate to
the Studios page. Your new Studio appears in
the list with details such as Studio name,
Creation date, and Studio access
URL. Your EMR Studio opens in a new tab in your browser.
- Custom settings
-
To create a EMR Studio with custom settings
-
Open the Amazon EMR console at https://console.aws.amazon.com/emr.
-
Under EMR Studio on the left navigation, choose
Getting started. You can also create a new Studio
from the Studios page.
-
Choose Create a Studio to open the
Create a Studio page.
-
Enter a Studio name.
-
Choose to create a new S3 bucket or use an existing location.
-
Choose the Workspace to add to the Studio. You can add up to 3 Workspaces.
-
Under Authentication, choose an authentication mode for
the Studio and provide information according to the following table. To
learn more about authentication for EMR Studio, see Choose an authentication mode for
Amazon EMR Studio.
If you use... |
Do this... |
IAM authentication or federation |
The default authentication method is AWS Identity and Access Management
(IAM). At the bottom of the screen, you can also add tags
to give specific users access to the Studio as described in
Assign a user or group to an
EMR Studio.
If you want federated users to log in using the Studio URL
and credentials for your identity provider (IdP), select your IdP from
the dropdown list, and enter your Identity provider (IdP)
login URL and RelayState parameter
name.
For a list of IdP authentication URLs and RelayState names, see
Identity provider RelayState parameters
and authentication URLs.
|
IAM Identity Center authentication |
Select your EMR Studio Service Role
and User Role. For more information, see Create an EMR Studio service role and Create an EMR Studio user role for IAM Identity Center
authentication mode.
When you use IAM Identity Center (formerly AWS Single Sign
On) authentication for the Studio, you can choose
to streamline the sign-on experience for users with the Enable
trusted identity propagation option. With trusted identity propagation, users can log in with their
Identity Center credentials and have their identities propagated to
downstream AWS services when they use the
Studio. In the Application
access section, you can also specify whether all users and
groups in your Identity Center should have access to the Studio, or
if only assigned users and groups that you choose can access the
Studio. For more information, see Integrate Amazon EMR with AWS IAM Identity Center, and also Trusted identity propagation across applications in the AWS IAM Identity Center
User Guide. |
-
For VPC, choose an Amazon Virtual Private Cloud
(VPC) for the Studio from the dropdown list.
-
Under Subnets, select a maximum of five subnets in your
VPC to associate with the Studio. You have the option to add more subnets
after you create the Studio.
-
For Security groups, choose either the default security
groups or custom security groups. For more information, see Define security groups to control EMR Studio
network traffic.
If you choose... |
Do this... |
The default EMR Studio security groups |
To enable Git-based repository linking for the Studio,
choose Enable clusters/endpoints and Git
repository. Otherwise choose Enable
clusters/endpoints.
|
Custom security groups for your Studio |
-
Under Cluster/endpoint security group,
select the engine security group that you configured from the
dropdown list. Your Studio uses this security group to allow
inbound access from attached Workspaces.
-
Under Workspace security group,
select the Workspace security group that you configured
from the dropdown list. Your Studio uses this security group
with Workspaces to provide outbound access to attached
Amazon EMR clusters and publicly hosted Git repositories.
|
-
Add tags to your Studio and other resources. For more information about tags,
see Tag clusters.
-
Choose Create Studio and launch Workspace to finish and navigate to
the Studios page. Your new Studio appears in
the list with details such as Studio name,
Creation date, and Studio access
URL.
After you create a Studio, follow the instructions in Assign a user or group to an
EMR Studio.
- CLI
Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).
Example – Create an EMR Studio that uses IAM for authentication
The following example AWS CLI command creates an EMR Studio with IAM
authentication mode. When you use IAM authentication or federation for the
Studio, you don't specify a --user-role
.
To let federated users log in using the Studio URL and credentials for
your identity provider (IdP), specify your --idp-auth-url
and
--idp-relay-state-parameter-name
. For a list of IdP
authentication URLs and RelayState names, see Identity provider RelayState parameters
and authentication URLs.
aws emr create-studio \
--name <example-studio-name>
\
--auth-mode IAM \
--vpc-id <example-vpc-id>
\
--subnet-ids <subnet-id-1> <subnet-id-2>... <subnet-id-5>
\
--service-role <example-studio-service-role-name>
\
--user-role studio-user-role-name
\
--workspace-security-group-id <example-workspace-sg-id>
\
--engine-security-group-id <example-engine-sg-id>
\
--default-s3-location <example-s3-location>
\
--idp-auth-url <https://EXAMPLE/login/>
\
--idp-relay-state-parameter-name <example-RelayState>
Example – Create an EMR Studio that uses Identity Center for
authentication
The following AWS CLI example command creates an EMR Studio that uses IAM Identity Center
authentication mode. When you use IAM Identity Center authentication, you must specify a
--user-role
.
For more information about IAM Identity Center authentication mode, see Set up IAM Identity Center authentication mode for
Amazon EMR Studio.
aws emr create-studio \
--name <example-studio-name>
\
--auth-mode SSO \
--vpc-id <example-vpc-id>
\
--subnet-ids <subnet-id-1> <subnet-id-2>... <subnet-id-5>
\
--service-role <example-studio-service-role-name>
\
--user-role <example-studio-user-role-name>
\
--workspace-security-group-id <example-workspace-sg-id>
\
--engine-security-group-id <example-engine-sg-id>
\
--default-s3-location <example-s3-location>
--trusted-identity-propagation-enabled \
--idc-user-assignment OPTIONAL \
--idc-instance-arn <iam-identity-center-instance-arn>
Example – CLI output for aws emr create-studio
The following is an example of the output that appears after you create a
Studio.
{
StudioId: "es-123XXXXXXXXX",
Url: "https://es-123XXXXXXXXX.emrstudio-prod.us-east-1.amazonaws.com"
}
For more information about the create-studio
command, see AWS CLI Command Reference.
Identity provider RelayState parameters
and authentication URLs
When you use IAM federation, and you want users to log in using your Studio
URL and credentials for your identity provider (IdP), you can specify your
Identity provider (IdP) login URL and
RelayState parameter name when you Create an EMR Studio.
The following table shows the standard authentication URL and RelayState parameter
name for some popular identity providers.
Identity provider |
Parameter |
Authentication URL |
Auth0 |
RelayState |
https://<sub_domain> .auth0.com/samlp/<app_id> |
Google accounts |
RelayState |
https://accounts.google.com/o/saml2/initsso?idpid=<idp_id> &spid=<sp_id> &forceauthn=false |
Microsoft Azure |
RelayState |
https://myapps.microsoft.com/signin/<app_name> /<app_id> ?tenantId=<tenant_id> |
Okta |
RelayState |
https://<sub_domain> .okta.com/app/<app_name> /<app_id> /sso/saml |
PingFederate |
TargetResource |
https://<host> /idp/<idp_id> /startSSO.ping?PartnerSpId=<sp_id> |
PingOne |
TargetResource |
https://sso.connect.pingidentity.com/sso/sp/initsso?saasid=<app_id> &idpid=<idp_id> |