Considerations with Presto on Amazon EMR
Consider the following differences and limitations when you run Presto
Presto Command Line Executable
In Amazon EMR, PrestoDB and PrestoSQL both use the same command line executable,
presto-cli
, as in the following example.
presto-cli --catalog hive
Some Presto Deployment Properties not Configurable
Depending on the version of Amazon EMR that you use, some Presto deployment
configurations may not be available. For more information about these properties,
see Deploying Prestoproperties
files.
File | Configurable |
---|---|
|
PrestoDB: Configurable in Amazon EMR release versions 4.0.0 and
later. Use the PrestoSQL: Configurable in Amazon EMR release versions 6.1.0 and later. Use the |
|
PrestoDB: Configurable in Amazon EMR release versions 4.0.0 and
later. Use the PrestoSQL: Configurable in Amazon EMR release versions 6.1.0 and later. Use the |
|
PrestoDB: Configurable in Amazon EMR release versions 4.1.0 and
later. Use the PrestoSQL: Configurable in Amazon EMR release versions 6.1.0
and later. Use the |
|
PrestoDB: Configurable in Amazon EMR release version 5.6.0 and later. Use the PrestoSQL: Configurable in Amazon EMR release versions
6.1.0 and later. Use the |
|
Not configurable. |
EMRFS and PrestoS3FileSystem Configuration
With Amazon EMR release version 5.12.0 and later, PrestoDB can use EMRFS, and this is the default configuration. With Amazon EMR release version 6.1.0 and later, PrestoSQL also uses EMRFS as the default. For more information, see Using EMR File System (EMRFS) in the Amazon EMR Management Guide. With earlier release versions, PrestoS3FileSystem is the only option.
Using EMRFS has benefits. You can use a security configuration to set up encryption for EMRFS data in Amazon S3. You can also use IAM roles for EMRFS requests to Amazon S3. For more information, see Understanding Encryption Options and Configure IAM Roles for EMRFS Requests to Amazon S3 in the Amazon EMR Management Guide.
A configuration issue can cause Presto errors when querying underlying data in Amazon
S3 with Amazon EMR release version 5.12.0. This is because Presto fails to pick up
configuration classification values from emrfs-site.xml
. As a workaround, create an emrfs
subdirectory under usr/lib/presto/plugin/hive-hadoop2/
, create a symlink in usr/lib/presto/plugin/hive-hadoop2/emrfs
to the existing /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
file, and then restart the presto-server process (sudo presto-server stop
followed by sudo presto-server start
).
You can override the EMRFS default and use the PrestoS3FileSystem instead. To do this,
use the presto-connector-hive
configuration classification to set hive.s3-file-system-type
to PRESTO
as shown in the following example. For more information, see Configuring Applications.
[ { "Classification": "presto-connector-hive", "Properties": { "hive.s3-file-system-type": "PRESTO" } } ]
If you use PrestoS3FileSystem, use the presto-connector-hive
configuration classification or prestosql-connector-hive
for PrestoSQL to configure PrestoS3FileSystem properties. For more information about
available properties, see Amazon S3 Configuration
Default Setting for End User Impersonation
By default, Amazon EMR version 5.12.0 and later enables end user impersonation for
accessing HDFS. For more information, see End User Impersonationpresto-config
configuration classification to set the hive.hdfs.impersonation.enabled
property to false
.
Default Port for Presto Web Interface
By default, Amazon EMR configures the Presto web interface on the Presto
coordinator to use port 8889 (for PrestoDB and PrestoSQL). You can change the port
by using the presto-config
configuration classification to set the
http-server.http.port
property. For more information, see Config Properties
Issue with Hive Bucket Execution in Some Releases
Presto version 152.3 has an issue with Hive bucket execution that causes
significantly slower Presto query performance in some circumstances. This version
is included with Amazon EMR release versions 5.0.3, 5.1.0, and 5.2.0. To mitigate
this issue, use the presto-connector-hive
configuration classification to set the hive.bucket-execution
property to false
as shown in the following example.
[ { "Classification": "presto-connector-hive", "Properties": { "hive.bucket-execution": "false" } } ]