Establish access and permissions for Git-based repositories
EMR Studio supports the following Git-based services:
To let EMR Studio users associate a Git repository with a Workspace, set up the following access and permissions requirements. You can also configure Git-based repositories that you host in a private network by following the instructions in Configure a privately hosted Git repository for EMR Studio.
- Cluster internet access
-
Both Amazon EMR clusters running on Amazon EC2 and Amazon EMR on EKS clusters attached to Studio Workspaces must be in a private subnet that uses a network address translation (NAT) gateway, or they must be able to access the internet through a virtual private gateway. For more information, see Amazon VPC options when you launch a cluster.
The security groups that you use with EMR Studio must also include an outbound rule that allows Workspaces to route traffic to the internet from an attached EMR cluster. For more information, see Define security groups to control EMR Studio network traffic.
Important
If the network interface is in a public subnet, it won't be able to communicate with the internet through an internet gateway (IGW).
- Permissions for AWS Secrets Manager
-
To let EMR Studio users access Git repositories with secrets stored in AWS Secrets Manager, add a permissions policy to the service role for EMR Studio that allows the
secretsmanager:GetSecretValue
operation.
For information about how to link Git-based repositories to Workspaces, see Link Git-based repositories to an EMR Studio Workspace.
Configure a privately hosted Git repository for EMR Studio
Use the following instructions to configure privately hosted repositories for Amazon EMR Studio. Provide a configuration file with information about your DNS and Git servers. EMR Studio uses this information to configure Workspaces that can route traffic to your self-managed repositories.
Note
If you configure DnsServerIpV4
, EMR Studio uses your DNS server to
resolve both your GitServerDnsName
and your Amazon EMR endpoint, such as
elasticmapreduce.us-east-1.amazonaws.com
. To set up an endpoint for Amazon EMR,
connect to your endpoint through the VPC that you’re using with your Studio. This
ensures that the Amazon EMR endpoint resolves to a private IP. For more information, see Connect to Amazon EMR using an interface VPC
endpoint.
Prerequisites
Before you configure a privately hosted Git repository for EMR Studio, you need an Amazon S3 storage location where EMR Studio can back up the Workspaces and notebook files in the Studio. Use the same S3 bucket that you specify when you create a Studio.
To configure one or more privately hosted Git repositories for EMR Studio
-
Create a configuration file using the following template. Include the following values for each Git server that you want to specify in your configuration:
-
DnsServerIpV4
- The IPv4 address of your DNS server. If you provide values for bothDnsServerIpV4
andGitServerIpV4List
, the value forDnsServerIpV4
takes precedence and EMR Studio usesDnsServerIpV4
to resolve yourGitServerDnsName
.Note
To use privately hosted Git repositories, your DNS server must allow inbound access from EMR Studio. We urge you to secure your DNS server against other, unauthorized access.
-
GitServerDnsName
- The DNS name of your Git server. For example"git.example.com"
. -
GitServerIpV4List
- A list of IPv4 addresses that belong to your Git servers.
[ { "Type": "PrivatelyHostedGitConfig", "Value": [ { "DnsServerIpV4": "
<10.24.34.xxx>
", "GitServerDnsName": "<enterprise.git.com>
", "GitServerIpV4List": [ "<xxx.xxx.xxx.xxx>
", "<xxx.xxx.xxx.xxx>
" ] }, { "DnsServerIpV4": "<10.24.34.xxx>
", "GitServerDnsName": "<git.example.com>
", "GitServerIpV4List": [ "<xxx.xxx.xxx.xxx>
", "<xxx.xxx.xxx.xxx>
" ] } ] } ] -
-
Save your configuration file as
configuration.json
. -
Upload the configuration file into your Amazon S3 storage location in a folder called
life-cycle-configuration
. For example, if your default S3 location iss3://
, your configuration file would be inamzn-s3-demo-bucket
/studioss3://
.amzn-s3-demo-bucket
/studios/life-cycle-configuration/configuration.jsonImportant
We urge you to restrict access to your
life-cycle-configuration
folder to Studio administrators and to your EMR Studio service role, and that you secureconfiguration.json
against unauthorized access. For instructions, see Controlling access to a bucket with user policies or Security Best Practices for Amazon S3.For upload instructions, see Creating a folder and Uploading objects in the Amazon Simple Storage Service User Guide. To apply your configuration to an existing Workspace, close and restart the Workspace after you upload your configuration file to Amazon S3.