AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
Install additional software on your Amazon EMR cluster
EmrCluster provides the supportedProducts field
that installs third-party software on an Amazon EMR cluster, for example, it lets
you install a custom distribution of Hadoop, such as MapR. It accepts a
comma-separated list of arguments for the third-party software to read and
act on. The following example shows how to use the
supportedProducts field of EmrCluster to
create a custom MapR M3 edition cluster with Karmasphere Analytics
installed, and run an EmrActivity object on it.
{ "id": "MyEmrActivity", "type": "EmrActivity", "schedule": {"ref": "ResourcePeriod"}, "runsOn": {"ref": "MyEmrCluster"}, "postStepCommand": "echo Ending job >> /mnt/var/log/stepCommand.txt", "preStepCommand": "echo Starting job > /mnt/var/log/stepCommand.txt", "step": "/home/hadoop/contrib/streaming/hadoop-streaming.jar,-input,s3n://elasticmapreduce/samples/wordcount/input,-output, \ hdfs:///output32113/,-mapper,s3n://elasticmapreduce/samples/wordcount/wordSplitter.py,-reducer,aggregate" }, { "id": "MyEmrCluster", "type": "EmrCluster", "schedule": {"ref": "ResourcePeriod"}, "supportedProducts": ["mapr,--edition,m3,--version,1.2,--key1,value1","karmasphere-enterprise-utility"], "masterInstanceType": "m3.xlarge", "taskInstanceType": "m3.xlarge" }