Understanding in-transit encryption
You can configure an EMR cluster to run open-source frameworks such as
Apache Spark
If in-transit encryption is enabled on an EMR cluster, different network endpoints use different encryption mechanisms. See the following sections to learn more about the specific open-source framework network endpoints supported with in-transit encryption, the related encryption mechanisms, and which Amazon EMR release added the support. Each open-source application might also have different best practices and open-source framework configurations that you can change.
For the most in-transit encryption coverage, we recommend that you enable both in-transit encryption and Kerberos. If you only enable in-transit encryption, then in-transit encryption will be available only for the network endpoints that support TLS. Kerberos is necessary because some open-source framework network endpoints use Simple Authentication and Security Layer (SASL) for in-transit encryption.
Note that any open-source frameworks not supported in Amazon EMR 7.x.x releases are not included.
Spark
When you enable in-transit encryption in security configurations, spark.authenticate
is
automatically set to true
and uses AES-based encryption for RPC connections.
Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't use Spark applications that
depend on the Hive metastore. Hive 3 fixes this issue in HIVE-16340hive.metastore.use.SSL
to false
to work around this issue. For more information, see
Configure applications.
For more information, see Spark security
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Spark History Server |
spark.ssl.history.port |
18480 |
TLS |
emr-5.3.0+, emr-6.0.0+, emr-7.0.0+ |
Spark UI |
spark.ui.port |
4440 |
TLS |
emr-5.3.0+, emr-6.0.0+, emr-7.0.0+ |
Spark Driver |
spark.driver.port |
Dynamic |
Spark AES-based encryption |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
Spark Executor |
Executor Port (no named config) |
Dynamic |
Spark AES-based encryption |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
YARN NodeManager |
spark.shuffle.service.port1 |
7337 |
Spark AES-based encryption |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
1spark.shuffle.service.port
is hosted on YARN NodeManager but is only used by Apache Spark.
Hadoop YARN
Secure Hadoop RPCprivacy
and uses SASL-based in-transit encryption. This requires
that Kerberos authentication is enabled in the security configuration. If you don't want
in-transit encryption for Hadoop RPC, configure hadoop.rpc.protection = authentication
. We recommend
that you use the default configuration for maximum security.
If your TLS certificates can't meet TLS hostname verification requirements, you can configure
hadoop.ssl.hostname.verifier = ALLOW_ALL
. We recommend that you use the default configuration
of hadoop.ssl.hostname.verifier = DEFAULT
, which enforces TLS hostname verification.
To disable HTTPS for the YARN web application endpoints, configure yarn.http.policy = HTTP_ONLY
.
This makes it so that traffic to these endpoints stays unencrypted. We recommend that you use the default configuration
for maximum security.
For more information, see Hadoop in secure mode
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
ResourceManager |
yarn.resourcemanager.webapp.address |
8088 |
TLS |
emr-7.3.0+ |
ResourceManager |
yarn.resourcemanager.resource-tracker.address |
8025 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
ResourceManager |
yarn.resourcemanager.scheduler.address |
8030 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
ResourceManager |
yarn.resourcemanager.address |
8032 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
ResourceManager |
yarn.resourcemanager.admin.address |
8033 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
TimelineServer |
yarn.timeline-service.address |
10200 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
TimelineServer |
yarn.timeline-service.webapp.address |
8188 |
TLS |
emr-7.3.0+ |
WebApplicationProxy |
yarn.web-proxy.address |
20888 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
NodeManager |
yarn.nodemanager.address |
8041 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
NodeManager |
yarn.nodemanager.localizer.address |
8040 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
NodeManager |
yarn.nodemanager.webapp.address |
8044 |
TLS |
emr-7.3.0+ |
NodeManager |
mapreduce.shuffle.port1 |
13562 |
TLS |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
NodeManager |
spark.shuffle.service.port2 |
7337 |
Spark AES-based encryption |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
1 mapreduce.shuffle.port
is hosted on YARN NodeManager
but is only used by Hadoop MapReduce.
2 spark.shuffle.service.port
is hosted on YARN NodeManager
but is only used by Apache Spark.
Hadoop HDFS
The Hadoop name node, data node, and journal node all support TLS by default if in-transit encryption is enabled in EMR clusters.
Secure Hadoop RPCprivacy
and uses SASL-based in-transit encryption. This requires
that Kerberos authentication is enabled in the security configuration.
We recommend that you don't change the default ports used for HTTPS endpoints.
Data encryption on HDFS block transfer uses
For more information, see Hadoop in secure mode
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Namenode |
dfs.namenode.https-address |
9871 |
TLS |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
Namenode |
dfs.namenode.rpc-address |
8020 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
Datanode |
dfs.datanode.https.address |
9865 |
TLS |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
Datanode |
dfs.datanode.address |
9866 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
Journal Node |
dfs.journalnode.https-address |
8481 |
TLS |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
Journal Node |
dfs.journalnode.rpc-address |
8485 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
DFSZKFailoverController |
dfs.ha.zkfc.port |
8019 |
None |
TLS for ZKFC is only supported in Hadoop 3.4.0. See HADOOP-18919 |
Hadoop MapReduce
Hadoop MapReduce, job history server, and MapReduce shuffle all support TLS by default when in-transit encryption is enabled in EMR clusters.
Hadoop MapReduce encrypted shuffle
We recommend that you don't change the default ports for HTTPS endpoints.
For more information, see Hadoop in secure mode
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
JobHistoryServer |
mapreduce.jobhistory.webapp.https.address |
19890 |
TLS |
emr-7.3.0+ |
YARN NodeManager |
mapreduce.shuffle.port1 |
13562 |
TLS |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
1 mapreduce.shuffle.port
is hosted on YARN NodeManager
but is only used by Hadoop MapReduce.
Presto
In Amazon EMR releases 5.6.0 and higher, internal communication between the Presto coordinator and workers uses TLS
Amazon EMR sets up all the required configurations to enable
secure internal communication
If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Presto Coordinator |
http-server.https.port |
8446 |
TLS |
emr-5.6.0+, emr-6.0.0+, emr-7.0.0+ |
Presto Worker |
http-server.https.port |
8446 |
TLS |
emr-5.6.0+, emr-6.0.0+, emr-7.0.0+ |
Trino
In Amazon EMR releases 6.1.0 and higher, internal communication between the Presto coordinator and workers uses TLS
Amazon EMR sets up all the required configurations to enable
secure internal communication
If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Trino Coordinator |
http-server.https.port |
8446 |
TLS |
emr-6.1.0+, emr-7.0.0+ |
Trino Worker |
http-server.https.port |
8446 |
TLS |
emr-6.1.0+, emr-7.0.0+ |
Hive and Tez
By default, Hive server 2, Hive metastore server, Hive LLAP Daemon web UI, and Hive LLAP shuffle all support
TLS when in-transit encryption is enabled in the EMR clusters. For more information about the
Hive configurations, see Configuration properties
Tez UI that's hosted on the Tomcat server is also HTTPS-enabled when in-transit encryption is enable in
the EMR cluster. However, HTTPS is disabled for the Tez AM web UI service so AM users don't have access
to the keystore file for the opening SSL listener. You can also enable this behavior with the Boolean configurations
tez.am.tez-ui.webservice.enable.ssl
and tez.am.tez-ui.webservice.enable.client.auth
.
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
HiveServer2 |
hive.server2.thrift.port |
10000 |
TLS |
emr-6.9.0+, emr-7.0.0+ |
HiveServer2 |
hive.server2.thrift.http.port |
10001 |
TLS |
emr-6.9.0+, emr-7.0.0+ |
HiveServer2 |
hive.server2.webui.port |
10002 |
TLS |
emr-7.3.0+ |
HiveMetastoreServer |
hive.metastore.port |
9083 |
TLS |
emr-7.3.0+ |
LLAP Daemon |
hive.llap.daemon.yarn.shuffle.port |
15551 |
TLS |
emr-7.3.0+ |
LLAP Daemon |
hive.llap.daemon.web.port |
15002 |
TLS |
emr-7.3.0+ |
LLAP Daemon |
hive.llap.daemon.output.service.port |
15003 |
None |
Hive doesn't support in-transit encryption for this endpoint |
LLAP Daemon |
hive.llap.management.rpc.port |
15004 |
None |
Hive doesn't support in-transit encryption for this endpoint |
LLAP Daemon |
hive.llap.plugin.rpc.port |
Dynamic |
None |
Hive doesn't support in-transit encryption for this endpoint |
LLAP Daemon |
hive.llap.daemon.rpc.port |
Dynamic |
None |
Hive doesn't support in-transit encryption for this endpoint |
WebHCat |
templeton.port |
50111 |
TLS |
emr-7.3.0+ |
Tez Application Master |
tez.am.client.am.port-range tez.am.task.am.port-range |
Dynamic |
None |
Tez doesn't support in-transit encryption for this endpoint |
Tez Application Master |
tez.am.tez-ui.webservice.port-range |
Dynamic |
None |
Disabled by default. Can be enabled using Tez configurations in emr-7.3.0+ |
Tez Task |
N/A - not configurable |
Dynamic |
None |
Tez doesn't support in-transit encryption for this endpoint |
Tez UI |
Configurable via Tomcat server on which Tez UI is hosted |
8080 |
TLS |
emr-7.3.0+ |
Flink
Apache Flink REST endpoints and internal communication between flink processes support TLS by default when you enable in-transit encryption in EMR clusters.
security.ssl.internal.enabled
true
and uses in-transit encryption for internal
communication between the Flink processes. If you don't want in-transit encryption for internal communication,
disable that configuration. We recommend you use the default configuration for maximum security.
Amazon EMR sets security.ssl.rest.enabled
true
and uses in-transit encryption for the
REST endpoints. Additionally, Amazon EMR also sets historyserver.web.ssl.enabled
Amazon EMR uses security.ssl.algorithms
For more information, see SSL Setup
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Flink History Server |
historyserver.web.port |
8082 |
TLS |
emr-7.3.0+ |
Job Manager Rest Server |
rest.bind-port rest.port |
Dynamic |
TLS |
emr-7.3.0+ |
HBase
Amazon EMR sets
Secure Hadoop RPCprivacy
. HMaster and RegionServer use SASL-based
in-transit encryption. This requires that Kerberos authentication is enabled in the
security configuration.
Amazon EMR sets hbase.ssl.enabled
to true and uses TLS for UI endpoints. If you don't
want to use TLS for UI endpoints, disable this configuration. We recommend that you use the default
configuration for maximum security.
Amazon EMR sets hbase.rest.ssl.enabled
and hbase.thrift.ssl.enabled
and uses TLS for the REST and Thirft server endpoints, respectively. If you don't
want to use TLS for these endpoints, disable this configuration. We recommend that you use the default
configuration for maximum security.
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
HMaster |
HMaster |
16000 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
HMaster |
HMaster UI |
16010 |
TLS |
emr-7.3.0+ |
RegionServer |
RegionServer |
16020 |
SASL + Kerberos |
emr-4.8.0+, emr-5.0.0+, emr-6.0.0+, emr-7.0.0+ |
RegionServer |
RegionServer Info |
16030 |
TLS |
emr-7.3.0+ |
HBase Rest Server |
Rest Server |
8070 |
TLS |
emr-7.3.0+ |
HBase Rest Server |
Rest UI |
8085 |
TLS |
emr-7.3.0+ |
Hbase Thrift Server |
Thrift Server |
9090 |
TLS |
emr-7.3.0+ |
Hbase Thrift Server |
Thrift Server UI |
9095 |
TLS |
emr-7.3.0+ |
Phoenix
If you enabled in-transit encryption in your EMR cluster, Phoenix Query Serversupports
the TLS property phoenix.queryserver.tls.enabled
, which is set to true
by default.
To learn more, see
Configurations relating to HTTPS
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Query Server |
phoenix.queryserver.http.port |
8765 |
TLS |
emr-7.3.0+ |
Oozie
OOZIE-3673oozie.email.smtp.ssl.protocols
in the
oozie-site.xml
file. By default, if you enabled in-transit encryption, Amazon EMR uses the TLS v1.3 protocol.
OOZIE-3677keyStoreType
and trustStoreType
in oozie-site.xml
. OOZIE-3674 adds the parameter --insecure
to the Oozie client so it can ignore certificate errors.
Oozie enforces TLS hostname verification, which means that any certificate you use for in-transit encryption must meet hostname verification requirements.
If the certificate doesn't meet the criteria, the cluster might get stuck at the oozie share lib update
stage when Amazon EMR provisions the cluster.
We recommend that you update your certificates to make sure they're compliant with hostname verification. However, if you can't update the certificates,
you can disable SSL for Oozie by setting the oozie.https.enabled
property to false
in cluster configuration.
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
EmbeddedOozieServer |
oozie.https.port |
11443 |
TLS |
emr-7.3.0+ |
EmbeddedOozieServer |
oozie.email.smtp.port |
25 |
TLS |
emr-7.3.0+ |
Hue
By default, Hue supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about Hue configurations,
see Configure Hue with HTTPS / SSL
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Hue |
http_port |
8888 |
TLS |
emr-7.4.0+ |
Livy
By default, Livy supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about Livy configurations, see Enabling HTTPS with Apache Livy.
Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't
use the Livy server for Spark applications that depend on the Hive metastore. This issue is fixed in
HIVE-16340hive.metastore.use.SSL
to false
.
For more information, see Configure applications.
For more information, see enabling HTTPS with Apache Livy.
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
livy-server |
livy.server.port |
8998 |
TLS |
emr-7.4.0+ |
JupyterEnterpriseGateway
By default, Jupyter Enterprise Gateway supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about the Jupyter Enterprise Gateway configurations,
see Securing Enterprise Gateway Server
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
jupyter_enterprise_gateway |
c.EnterpriseGatewayApp.port |
9547 |
TLS |
emr-7.4.0+ |
JupyterHub
By default, JupyterHub supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information,
see Enabling SSL encryption
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
jupyter_hub |
c.JupyterHub.port |
9443 |
TLS |
emr-5.14.0+, emr-6.0.0+, emr-7.0.0+ |
Zeppelin
By default, Zeppelin supports TLS when you enable in-transit encryption in your EMR cluster.
For more information about the Zeppelin configurations, see
SSL Configuration
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
zeppelin |
zeppelin.server.ssl.port |
8890 |
TLS |
7.3.0+ |
Zookeeper
Amazon EMR sets serverCnxnFactory
to org.apache.zookeeper.server.NettyServerCnxnFactory
to enable TLS for the Zookeeper quorum and client communication.
secureClientPort
specifies the port that listens to TLS connections. If the client
doesn't support TLS connections to Zookeeper, clients can connect to the insecure port of 2181 specified in
clientPort
. You can override or disable these two ports.
Amazon EMR sets both sslQuorum
and admin.forceHttps
to true
to enable
TLS communication for the quorum and admin server. If you don't want in-transit encryption for the quorum and
the admin server, you can disable those configurations. We recommend that you use the default configurations for maximum security.
For more information, see Encryption, Authentication, Authorization Options
Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release |
---|---|---|---|---|
Zookeeper Server |
secureClientPort |
2281 |
TLS |
emr-7.4.0+ |
Zookeeper Server |
Quorum Ports |
There are 2: Followers use 2888 to connect to the leader. Leader election uses 3888 |
TLS |
emr-7.4.0+ |
Zookeeper Server |
admin.serverPort |
8341 |
TLS |
emr-7.4.0+ |