EMRFS CLI Command Reference
The EMRFS CLI is installed by default on all cluster master nodes created using Amazon EMR release version 3.2.1 or later. You can use the EMRFS CLI to manage the metadata for consistent view.
Note
The emrfs command is only supported with VT100 terminal emulation. However, it may work with other terminal emulator modes.
emrfs top-level command
The emrfs top-level command supports the following structure.
emrfs [describe-metadata | set-metadata-capacity | delete-metadata | create-metadata | \ list-metadata-stores | diff | delete | sync | import ]
[options]
[arguments]
Specify [options], with or without [arguments] as described in the following
table. For [options] specific to sub-commands (describe-metadata
,
set-metadata-capacity
, etc.), see each sub-command
below.
Option | Description | Required |
---|---|---|
|
The AWS access key you use to write objects to Amazon S3 and to
create or access a metadata store in DynamoDB. By default,
|
No |
|
The AWS secret key associated with the access key you use
to write objects to Amazon S3 and to create or access a metadata
store in DynamoDB. By default,
|
No |
|
Makes output verbose. |
No |
|
Displays the help message for the |
No |
emrfs describe-metadata sub-command
Option | Description | Required |
---|---|---|
|
|
No |
Example emrfs describe-metadata example
The following example describes the default metadata table.
$ emrfs describe-metadata
EmrFSMetadata
read-capacity: 400
write-capacity: 100
status: ACTIVE
approximate-item-count (6 hour delay): 12
emrfs set-metadata-capacity
sub-command
Option | Description | Required |
---|---|---|
|
|
No |
|
The requested read throughput capacity for the metadata
table. If the |
No |
|
The requested write throughput capacity for the metadata
table. If the |
No |
Example emrfs set-metadata-capacity example
The following example sets the read throughput capacity to
600
and the write capacity to 150
for a
metadata table named EmrMetadataAlt
.
$ emrfs set-metadata-capacity --metadata-name EmrMetadataAlt --read-capacity 600 --write-capacity 150
read-capacity: 400
write-capacity: 100
status: UPDATING
approximate-item-count (6 hour delay): 0
emrfs delete-metadata
sub-command
Option | Description | Required |
---|---|---|
|
|
No |
Example emrfs delete-metadata example
The following example deletes the default metadata table.
$ emrfs delete-metadata
emrfs create-metadata
sub-command
Option | Description | Required |
---|---|---|
|
|
No |
|
The requested read throughput capacity for the metadata
table. If the |
No |
|
The requested write throughput capacity for the metadata
table. If the |
No |
Example emrfs create-metadata example
The following example creates a metadata table named
EmrFSMetadataAlt
.
$ emrfs create-metadata -m EmrFSMetadataAlt
Creating metadata: EmrFSMetadataAlt
EmrFSMetadataAlt
read-capacity: 400
write-capacity: 100
status: ACTIVE
approximate-item-count (6 hour delay): 0
emrfs list-metadata-stores
sub-command
The emrfs list-metadata-stores sub-command has no [options].
Example List-metadata-stores example
The following example lists your metadata tables.
$ emrfs list-metadata-stores
EmrFSMetadata
emrfs diff sub-command
Option | Description | Required |
---|---|---|
|
|
No |
|
The path to the Amazon S3 bucket to compare with the metadata table. Buckets sync recursively. |
Yes |
Example emrfs diff example
The following example compares the default metadata table to an Amazon S3 bucket.
$ emrfs diff s3://elasticmapreduce/samples/cloudfront
BOTH | MANIFEST ONLY | S3 ONLY
DIR elasticmapreduce/samples/cloudfront
DIR elasticmapreduce/samples/cloudfront/code/
DIR elasticmapreduce/samples/cloudfront/input/
DIR elasticmapreduce/samples/cloudfront/logprocessor.jar
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-14.WxYz1234
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-15.WxYz1234
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-16.WxYz1234
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-17.WxYz1234
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-18.WxYz1234
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-19.WxYz1234
DIR elasticmapreduce/samples/cloudfront/input/XABCD12345678.2009-05-05-20.WxYz1234
DIR elasticmapreduce/samples/cloudfront/code/cloudfront-loganalyzer.tgz
emrfs delete sub-command
Option |
Description |
Required |
---|---|---|
|
|
No |
|
The path to the Amazon S3 bucket you are tracking for consistent view. Buckets sync recursively. |
Yes |
-t |
The expiration time (interpreted using the time unit
argument). All metadata entries older than the
|
|
|
The measure used to interpret the time argument
(nanoseconds, microseconds, milliseconds, seconds, minutes,
hours, or days). If no argument is specified, the default
value is |
|
|
The requested amount of available read throughput used for
the delete operation. If the
|
No |
|
The requested amount of available write throughput used
for the delete operation. If the
|
No |
Example emrfs delete example
The following example removes all objects in an Amazon S3 bucket from the tracking metadata for consistent view.
$ emrfs delete s3://elasticmapreduce/samples/cloudfront
entries deleted: 11
emrfs import sub-command
Option | Description | Required |
---|---|---|
|
|
No |
|
The path to the Amazon S3 bucket you are tracking for consistent view. Buckets sync recursively. |
Yes |
|
The requested amount of available read throughput used for
the delete operation. If the
|
No |
|
The requested amount of available write throughput used
for the delete operation. If the
|
No |
Example emrfs import example
The following example imports all objects in an Amazon S3 bucket with the tracking metadata for consistent view. All unknown keys are ignored.
$ emrfs import s3://elasticmapreduce/samples/cloudfront
emrfs sync sub-command
Option | Description | Required |
---|---|---|
|
|
No |
|
The path to the Amazon S3 bucket you are tracking for consistent view. Buckets sync recursively. |
Yes |
|
The requested amount of available read throughput used for
the delete operation. If the
|
No |
|
The requested amount of available write throughput used
for the delete operation. If the
|
No |
Example emrfs sync command example
The following example imports all objects in an Amazon S3 bucket with the tracking metadata for consistent view. All unknown keys are deleted.
$ emrfs sync s3://elasticmapreduce/samples/cloudfront
Synching samples/cloudfront 0 added | 0 updated | 0 removed | 0 unchanged
Synching samples/cloudfront/code/ 1 added | 0 updated | 0 removed | 0 unchanged
Synching samples/cloudfront/ 2 added | 0 updated | 0 removed | 0 unchanged
Synching samples/cloudfront/input/ 9 added | 0 updated | 0 removed | 0 unchanged
Done synching s3://elasticmapreduce/samples/cloudfront 9 added | 0 updated | 1 removed | 0 unchanged
creating 3 folder key(s)
folders written: 3
emrfs read-sqs sub-command
Option | Description | Required |
---|---|---|
|
|
Yes |
|
|
Yes |
emrfs delete-sqs sub-command
Option | Description | Required |
---|---|---|
|
|
Yes |
Submitting EMRFS CLI commands as
steps
The following example shows how to use the emrfs
utility on the
master node by leveraging the AWS CLI or API and the
command-runner.jar
to run the emrfs
command as a
step. The example uses the AWS SDK for Python (Boto3) to add a step to a cluster
which adds objects in an Amazon S3 bucket to the default EMRFS metadata table.
import boto3
from botocore.exceptions import ClientError
def add_emrfs_step(command, bucket_url, cluster_id, emr_client):
"""
Add an EMRFS command as a job flow step to an existing cluster.
:param command: The EMRFS command to run.
:param bucket_url: The URL of a bucket that contains tracking metadata.
:param cluster_id: The ID of the cluster to update.
:param emr_client: The Boto3 Amazon EMR client object.
:return: The ID of the added job flow step. Status can be tracked by calling
the emr_client.describe_step() function.
"""
job_flow_step = {
"Name": "Example EMRFS Command Step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": ["/usr/bin/emrfs", command, bucket_url],
},
}
try:
response = emr_client.add_job_flow_steps(
JobFlowId=cluster_id, Steps=[job_flow_step]
)
step_id = response["StepIds"][0]
print(f"Added step {step_id} to cluster {cluster_id}.")
except ClientError:
print(f"Couldn't add a step to cluster {cluster_id}.")
raise
else:
return step_id
def usage_demo():
emr_client = boto3.client("emr")
# Assumes the first waiting cluster has EMRFS enabled and has created metadata
# with the default name of 'EmrFSMetadata'.
cluster = emr_client.list_clusters(ClusterStates=["WAITING"])["Clusters"][0]
add_emrfs_step(
"sync", "s3://elasticmapreduce/samples/cloudfront", cluster["Id"], emr_client
)
if __name__ == "__main__":
usage_demo()
You can use the step_id
value returned to check the logs for the result of the
operation.