EmrConfiguration - AWS Data Pipeline

EmrConfiguration

EmrConfiguration 对象是用于 4.0.0 版或更高版本的 EMR 集群的配置。配置 (列表形式) 是一个用于 RunJobFlow API 调用的参数。Amazon EMR 的配置 API 采用分类和属性。AWS Data Pipeline 将 EmrConfiguration 与相应的属性对象结合使用来配置 EmrCluster 应用程序,例如在管道执行中启动的 EMR 集群上的 Hadoop、Hive、Spark 或 Pig。由于只能为新集群更改配置,因此,您无法为现有资源提供 EmrConfiguration 对象。有关更多信息,请参阅https://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/

示例

以下配置对象在 core-site.xml 中设置 io.file.buffer.sizefs.s3.block.size 属性:

[ { "classification":"core-site", "properties": { "io.file.buffer.size": "4096", "fs.s3.block.size": "67108864" } } ]

相应的管道对象定义在 property 字段中使用一个 EmrConfiguration 对象和一系列属性对象:

{ "objects": [ { "name": "ReleaseLabelCluster", "releaseLabel": "emr-4.1.0", "applications": ["spark", "hive", "pig"], "id": "ResourceId_I1mCc", "type": "EmrCluster", "configuration": { "ref": "coresite" } }, { "name": "coresite", "id": "coresite", "type": "EmrConfiguration", "classification": "core-site", "property": [{ "ref": "io-file-buffer-size" }, { "ref": "fs-s3-block-size" } ] }, { "name": "io-file-buffer-size", "id": "io-file-buffer-size", "type": "Property", "key": "io.file.buffer.size", "value": "4096" }, { "name": "fs-s3-block-size", "id": "fs-s3-block-size", "type": "Property", "key": "fs.s3.block.size", "value": "67108864" } ] }

以下示例是一个嵌套配置,用于通过 hadoop-env 分类设置 Hadoop 环境:

[ { "classification": "hadoop-env", "properties": {}, "configurations": [ { "classification": "export", "properties": { "YARN_PROXYSERVER_HEAPSIZE": "2396" } } ] } ]

以下是使用此配置的相应管道定义对象:

{ "objects": [ { "name": "ReleaseLabelCluster", "releaseLabel": "emr-4.0.0", "applications": ["spark", "hive", "pig"], "id": "ResourceId_I1mCc", "type": "EmrCluster", "configuration": { "ref": "hadoop-env" } }, { "name": "hadoop-env", "id": "hadoop-env", "type": "EmrConfiguration", "classification": "hadoop-env", "configuration": { "ref": "export" } }, { "name": "export", "id": "export", "type": "EmrConfiguration", "classification": "export", "property": { "ref": "yarn-proxyserver-heapsize" } }, { "name": "yarn-proxyserver-heapsize", "id": "yarn-proxyserver-heapsize", "type": "Property", "key": "YARN_PROXYSERVER_HEAPSIZE", "value": "2396" }, ] }

以下示例修改了 EMR 集群的 Hive 特定属性:

{ "objects": [ { "name": "hivesite", "id": "hivesite", "type": "EmrConfiguration", "classification": "hive-site", "property": [ { "ref": "hive-client-timeout" } ] }, { "name": "hive-client-timeout", "id": "hive-client-timeout", "type": "Property", "key": "hive.metastore.client.socket.timeout", "value": "2400s" } ] }

语法

该对象包含以下字段。

必填字段 描述 槽类型
分类 配置的分类。 String

可选字段 描述 槽类型
配置 此配置的子配置。 引用对象,例如,"configuration":{"ref":"myEmrConfigurationId"}
parent 槽将继承自的当前对象的父级。 引用对象,例如,"parent":{"ref":"myBaseObjectId"}
property 配置属性。 引用对象,例如,"property":{"ref":"myPropertyId"}

运行时字段 描述 槽类型
@version 用来创建对象的管道版本。 String

系统字段 描述 槽类型
@error 用于描述格式不正确的对象的错误消息 String
@pipelineId 该对象所属的管道的 ID String
@sphere 对象的范围指明对象在生命周期中的位置:组件对象产生实例对象,后者执行尝试对象 String

另请参阅