Data Types
Both Amazon DynamoDB and Apache HBase support unstructured datasets with a wide range of data types.
Amazon DynamoDB supports the data types shown in the following table:
Table 12: Amazon DynamoDB Data Types
Type | Description | Example (JSON Format) |
---|---|---|
Scalar | ||
String | Unicode with UTF8 binary encoding | {"S": "Game01"} |
Number | Positive or negative exact- value decimals and integers | {"N": "67453"} |
Binary | Encoded sequence of bytes |
{"B": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1l"} |
Boolean | True or false | {"BOOL": true} |
Null | Unknown or undefined state | {"NULL": true} |
Document | ||
List | Ordered collection of values | {"L": ["Game01", 67453]} |
Map | Unordered collection of name-value pairs | {"M": {"GameId": {"S": "Game01"}, "TopScore": {"N": "67453"}}} |
Multi-valued | ||
String Set | Unique set of strings | {"SS": ["Black","Green] } |
Number Set | Unique set of numbers | {"NS": ["42.2","-19.87"] } |
Binary Set | Unique set of binary values | {"BS": ["U3Vubnk=","UmFpbnk="] } |
Each Amazon DynamoDB attribute can be a name-value pair with exactly one value (scalar type), a complex data structure with nested attributes (document type), or a unique set of values (multi-valued set type). Individual items in an Amazon DynamoDB table can have any number of attributes.
Primary key attributes can only be scalar types with a single value and the only data types allowed are string, number, or binary. Binary type attributes can store any binary data, for example, compressed data, encrypted data, or even images.
Map is ideal for storing JSON documents in Amazon DynamoDB. For example, in Table 6, Person could be represented as a map of person id that maps to detailed information about the person: name, gender, and a list of their previous addresses also represented as a map. This is illustrated in the following script:
{
"PersonId": 1001,
"FirstName": "Fname-1",
"LastName": "Lname-1",
"Gender": "M",
"Addresses": [
{
"Street": "Main St",
"City": "Seattle",
"Zipcode": 98005,
"Type": "current",
},
{
"Street": "9th St",
"City": Seattle,
"Zipcode": 98005,
"Type": "past",
}
]
}
In summary, Apache HBase defines the following concepts:
-
Row—An atomic byte array or key/value container.
-
Column—A key within the key/value container inside a row.
-
Column Family—Divides columns into related subsets of data that are stored together on disk.
-
Timestamp—Apache HBase adds the concept of a fourth dimension column that is expressed as an explicit or implicit timestamp. A timestamp is usually represented as a long integer in milliseconds.
-
Value—A time-versioned value in the key/value container. This means that a cell can contain multiple versions of a value that can change over time. Versions are stored in decreasing timestamp, with the most recent first.
Apache HBase supports a bytes-in/bytes-out interface. This means that anything that can be converted into an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can be rendered as bytes.
Consequently, key/value pairs in Apache HBase are arbitrary arrays of bytes. Because row keys and column qualifiers are also arbitrary arrays of bytes, almost anything can serve as a row key or column qualifier, from strings to binary representations of longs or even serialized data structures.
Column family names must comprise printable characters in human-readable format. This is because column family names are used as part of the directory name in the file system. Furthermore, column families must be declared up front at the time of schema definition. Column qualifiers are not subjected to this restriction and can comprise any arbitrary binary characters and be created at runtime.