AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal. Learn more
DynamoDBDataFormat
Applies a schema to a DynamoDB table to make it accessible by a Hive query.
DynamoDBDataFormat is used with a HiveActivity object
and a DynamoDBDataNode input and output.
DynamoDBDataFormat requires that you specify all columns in your
Hive query. For more flexibility to specify certain columns in a Hive query or Amazon S3
support, see DynamoDBExportDataFormat.
Note
DynamoDB Boolean types are not mapped to Hive Boolean types. However, it is possible to map DynamoDB integer values of 0 or 1 to Hive Boolean types.
Example
The following example shows how to use DynamoDBDataFormat to
assign a schema to a DynamoDBDataNode input, which allows a
HiveActivity object to access the data by named columns and
copy the data to a DynamoDBDataNode output.
{ "objects": [ { "id" : "Exists.1", "name" : "Exists.1", "type" : "Exists" }, { "id" : "DataFormat.1", "name" : "DataFormat.1", "type" : "DynamoDBDataFormat", "column" : [ "hash STRING", "range STRING" ] }, { "id" : "DynamoDBDataNode.1", "name" : "DynamoDBDataNode.1", "type" : "DynamoDBDataNode", "tableName" : "$INPUT_TABLE_NAME", "schedule" : { "ref" : "ResourcePeriod" }, "dataFormat" : { "ref" : "DataFormat.1" } }, { "id" : "DynamoDBDataNode.2", "name" : "DynamoDBDataNode.2", "type" : "DynamoDBDataNode", "tableName" : "$OUTPUT_TABLE_NAME", "schedule" : { "ref" : "ResourcePeriod" }, "dataFormat" : { "ref" : "DataFormat.1" } }, { "id" : "EmrCluster.1", "name" : "EmrCluster.1", "type" : "EmrCluster", "schedule" : { "ref" : "ResourcePeriod" }, "masterInstanceType" : "m1.small", "keyPair" : "$KEYPAIR" }, { "id" : "HiveActivity.1", "name" : "HiveActivity.1", "type" : "HiveActivity", "input" : { "ref" : "DynamoDBDataNode.1" }, "output" : { "ref" : "DynamoDBDataNode.2" }, "schedule" : { "ref" : "ResourcePeriod" }, "runsOn" : { "ref" : "EmrCluster.1" }, "hiveScript" : "insert overwrite table ${output1} select * from ${input1} ;" }, { "id" : "ResourcePeriod", "name" : "ResourcePeriod", "type" : "Schedule", "period" : "1 day", "startDateTime" : "2012-05-04T00:00:00", "endDateTime" : "2012-05-05T00:00:00" } ] }
Syntax
| Optional Fields | Description | Slot Type |
|---|---|---|
| column | The column name with data type specified by each field for the data described by this data
node. For example, hostname STRING. For multiple values, use
column names and data types separated by a space. |
String |
| parent | The parent of the current object from which slots will be inherited. | Reference Object, such as "parent":{"ref":"myBaseObjectId"} |
| Runtime Fields | Description | Slot Type |
|---|---|---|
| @version | The pipeline version uses to create the object. | String |
| System Fields | Description | Slot Type |
|---|---|---|
| @error | The error describing the ill-formed object. | String |
| @pipelineId | The Id of the pipeline to which this object belongs. | String |
| @sphere | The sphere of an object denotes its place in the lifecycle: Component Objects give rise to Instance Objects which execute Attempt Objects. | String |