Menu
Amazon DynamoDB
Developer Guide (API Version 2012-08-10)

Titan Graph Modeling in DynamoDB

Titan stores edges and properties as column-value pairs associated with a vertex and a unique key. The DynamoDB Storage Backend for Titan stores each column-value pair in a single attribute in DynamoDB. For information on how column-value pairs are serialized, go to the Individual Edge Layout section in the Titan Data Model documentation.

The following sections describe graph modeling in the DynamoDB Storage Backend for Titan.

Titan uses the KeyColumnValueStore interface to store column-value pairs in the backend database. The DynamoDB Storage Backend for Titan comes with the following two concrete implementations of this interface:

Implementation ClassConfig PropertyDescription
DynamoDBSingleRowStoreSINGLEStores all column-value pairs for a key in a single item.
DynamoDBStoreMULTIStores each column-value pair for a key in a different item in a table that has a composite primary key (partition key and sort key).

Both implementations store information in a DynamoDB table named edgestore.

Including edgestore, the Titan version 0.4.4 uses the following DynamoDB tables for storage:

TableDescription
edgestoreStores all properties and edges (column-value pairs). One item per vertex.
edgeindexIndex of edges.
vertexindex Index of all vertices.
titan_idsClient IDs for each instance of the plugin.
system_properties Storage backend properties.

Titan versions 0.5.4 and later uses different backend tables. Including edgestore, the Titan versions 0.5.4 and later uses the following DynamoDB tables for storage:

TableDescription
edgestoreStores all properties and edges (column-value pairs). One item per vertex.
graphindexIndex of edges and vertices.
systemlog Titan system log.
txlogTransaction log.
titan_idsClient IDs for each instance of the plugin.
system_properties Storage backend properties.

Note

Titan versions 0.5.4 and later also supports user-defined transaction logs, which are each stored in a table.

You can select either single or multiple item storage options in the DynamoDB Storage Backend for Titan properties file. The following sections describe the two implementations.

Single Item Data Model

The single item data model stores all column-value pairs at a particular key in one DynamoDB item. The edgestore table stores all properties and edges for a vertex in a single item where the partition key is the key of a KeyColumnValueStore store, or KCV. For details, see KeyColumnValueStore.

The following table shows how the social network graph from the preceding Working with Graph Databases section would be stored in the edgestore DynamoDB table in the single item data model. It also shows hidden properties. Titan adds a hidden property to each node to indicate that it exists.

Note

This is a representation of the data that is stored in a table. The actual data is serialized with compressed metadata and is not human readable.

Partition Key (pk)AttributeAttributeAttributeAttributeAttribute
Vertex id 1Property - Name: JustinEdge (out) - Friend: AnnaEdge (out) - Friend: KrisEdge (out) - Likes: MoviesHidden Property - Exists
Vertex id 2Property - Name: AnnaEdge (in) - Friend: JustinEdge (out) - Likes: BooksHidden Property - Exists 
Vertex id 3Property - Name: KrisEdge (in) - Friend: JustinEdge (out) - Likes: MoviesHidden Property - Exists 
Vertex id 4Property - Name: MovieEdge (in) - Likes: JustinEdge (in) - Likes: KrisHidden Property - Exists 
Vertex id 5Property - Name: BooksEdge (in) - Likes: AnnaHidden Property - Exists  

This table does not show all of the data that is stored in each attribute. For information about the data and data format of edges and properties stored in attributes, see the Titan Data Model page.

A limitation of this model is that storing everything in a single item limits the number of properties and edges incident to each vertex because DynamoDB has a 400 KB item size limit.

Multiple Item Data Model

To avoid the 400 KB item-size limitation, the DynamoDB Storage Backend for Titan provides multiple item storage as an alternative model. If your graph has any of the following characteristics, you might want to use multiple item storage:

  • A high number of edges for each vertex

  • A large number of vertex properties

  • An individual property value that is sized close to the item size limit

In these cases, we recommend using the multiple-item model for at least the edgestore and the index stores (edgeindex and vertexindex in 0.4.4, and graphindex in 0.5.4). The edgestore and index stores are the most likely tables to be impacted by the item-size limit.

The multiple item data model stores each column-value pair in a separate DynamoDB item. Each column-value pair is stored as an item where the partition key is the key of KCV and the sort key is the column of KCV. All of the column-value pairs at a particular key are stored in different items in the edgestore table.

The following table shows how the social network graph from the preceding Working with Graph Databases section would be stored in the edgestore DynamoDB table in the multiple item data model. It also shows hidden properties. Titan adds a hidden property to each node to indicate that it exists.

Note

This is a representation of the data that is stored in a table. The actual data is serialized with compressed metadata and is not human readable.

Partition Key (pk)Sort Key (sk)Value (v)
Vertex id 1Sort key 
Vertex id 1Property idProperty - Name: Justin
Vertex id 1Edge idEdge (out) - Friend: Anna
Vertex id 1Edge idEdge (out) - Friend: Kris
Vertex id 1Edge idEdge (out) - Likes: Movies
Vertex id 1Property idHidden Property - Exists
Vertex id 2Sort key 
Vertex id 2Property idProperty - Name: Anna
Vertex id 2Edge idEdge (in) - Friend: Justin
Vertex id 2Edge idEdge (out) - Likes: Books
Vertex id 2Property idHidden Property - Exists
Vertex id 3Sort key 
Vertex id 3Property idProperty - Name: Kris
Vertex id 3Edge idEdge (in) - Friend: Justin
Vertex id 3Edge idEdge (out) - Likes: Movies
Vertex id 3Property idHidden Property - Exists
Vertex id 4Sort key 
Vertex id 4Property idProperty - Name: Movies
Vertex id 4Edge idEdge (in) - Likes: Justin
Vertex id 4Edge idEdge (in) - Likes: Kris
Vertex id 4Property idHidden Property - Exists
Vertex id 5Sort key 
Vertex id 5Property idProperty - Name: Books
Vertex id 5Edge idEdge (in) - Likes: Anna
Vertex id 5Property idHidden Property - Exists

This table does not show all of the data that is stored in each attribute. For information about the data and the data format of edges and properties stored in attributes, go to the Titan Data Model page.

Although the multiple item data model lets you avoid the 400 KB item limit, it comes with a performance penalty. Scanning the base table to iterate over vertices in the edgestore table can take much longer in the multiple item data model than in the single item data model.

The multiple item data model overcomes the 400 KB limit by denormalizing one entity in a store into one item for each column at a key. This functionality means that one key appears once for each column in a multiple item data store. The greater scan time for this model occurs because a separate item exists in this model for each edge label, vertex property, and edge property. The edgestore_key table stores the key and revision number of the store entry, so a scan only accesses a KCV key once with each mutating operation. This functionality means that any mutation to a KCV store requires at least two HTTP round trips, one for the key table and at least one for the base table, and more if the mutation only involves deleting columns.

Storage Changes in Titan versions 0.5.4 and later

DynamoDB Storage Backend for Titan versions 0.5.4 and later stores graph data in the same way as with version 0.4.4, with the following differences:

  • Partitioned vertices are available. The partitions of a vertex are all read and written in parallel.

  • The vertexindex and edgeindex tables are combined into a single index store named graphindex.

  • Titan 0.5.4 supports user-defined transaction logs. Each user-defined transaction log corresponds to an extra DynamoDB table that needs to be configured in your .properties/rexster.xml.

  • Titan 1.0.0 also supports user-defined transaction logs. Each user-defined transaction log corresponds to an extra DynamoDB table that needs to be configured in your dynamodb-properties file.

Limits of the DynamoDB Storage Backend for Titan

DynamoDB imposes limits on the size of partition keys (2048 bytes) and sort keys (1024 bytes) and on total item size (400 KB). As such, the Titan DynamoDB BigTable implementation has some limits that are described in the following list. BigTable is the name of the storage abstraction for Titan backends. For details about the Titan BigTable abstraction, see BigTable.

  • When using built in indexes, indexed property values are limited by the maximum size of partition keys, 2048 bytes. If there is a need to index larger values, like for example, documents, you should use a mixed indexer (for example, Elasticsearch, Solr, or Lucene) to enable full text search.

  • The maximum column value size will vary due to variable id encoding schemas and compressed object serialization in Titan, but is limited to 400 KB in the item representation because that is the maximum item size of DynamoDB. In the single item data model, this means that all of the columns stored at one key (an out-vertex in the edgestore) of a KCVStore must be less than 400 KB. In the multiple item data model, everything stored in one column (one key-column pair, for example, a vertex property, or an edge) must be less than or equal to 400 KB in size. Because all edges coming out of a vertex are stored in one item in the single item data model, the single item model can only be used with graphs with a small out degree.

  • Using DynamoDB table prefixes, you can have 51 graphs per region in version 0.4.4, and 42 graphs per region in versions 0.5.4 and later, as long as there are no user-defined transaction logs. If you use user-defined transaction logs, there will be an extra table for each log, so the number of graphs you can store in a region will decrease. For more information, see user-defined transaction logs in the Titan documentation. By default, the number of DynamoDB tables is limited to 256 tables per region. If you want to have more graphs in a region, you can request an increase to your account limits. For more information about account limits, see the Limits in DynamoDB page.

The preceding limits are in addition to the limitations of Titan. For information about the limits of Titan, go to the Technical Limititations page in the Titan documentation.

Backend Data Usage

Provisioning for TitanDB backend storage is dependent on the graph design (for example, many vertices vs. many properties), usage (reading vs. writing vs. updating), and the storage data model (single vs. multiple).

In any graph, the edgestore table will have the most data and usage.

The following table can help you estimate how much to provision for the edgestore table. You need to have estimates for how many of the following graph objects you will process (read, write, or update) each second:

  • Vertices: The number of vertices. Applies to the single and multiple item models.

  • Properties per vertex: The number of properties on each vertex. Applies to the multiple item model.

  • Edges out per vertex: The number of edges from a vertex to other vertices. Applies to the single and multiple item models.

    Note

    Edges are bidirectional by default. Unless you create unidirectional (out only) edges, the edges out and in will be equal.

  • Edges into vertex: The number of edges coming in to the vertex.

  • Hidden properties: Properties stored by Titan. Each vertex has at least an exists property. Inn Titan version 0.4.4 there is at least 1 hidden property per vertex and in Titan 0.5.4 there is at least 2 hidden properties per vertex.

Note

In the single item data model, many of the graph objects are serialized into a single item with the vertex, so they are not needed to estimate usage.

TypeUpdate/DeleteItem calls to edgestore, SINGLE item modelUpdate/DeleteItem calls to edgestore, MULTI item model
Createvertices * edgesOutOfVertexvertices * (vertexProperties + edgesIntoVertex + edgesOutOfVertex + titanHiddenProperties)
Updatingvertices * edgesOutOfVertexvertices + vertexProperties + edgesIntoVertex + edgesOutOfVertex + titanHiddenProperties
Readingverticesvertices + vertexProperties + edgesIntoVertex + edgesOutOfVertex + titanHiddenProperties

The preceding table separates capacity estimations by storage model and operation type. The following list gives more information on DynamoDB activity for different operation types, and discusses the effects of indexes.

Loading data

Bulk loading of data is very write intensive. Loading new data into a graph requires items to be created in the backend database. In the multiple item model creating is more intensive because each vertex, property, and edge is written as a separate item.

Updating data

Updating data is less intensive with the multiple item storage model because it only needs to update the items for the specific property or edge that is being updated. In the single item storage model, the entire item must be updated.

Reading data

In the multiple item model, reading a subset of properties or edges can more efficient than in the single model because only the items you request are read instead of the whole item.

Operations on the graphindex table

In the single item model, items in the graphindex table are keyed at unique vertex/edge property name and value combinations, and the other item attributes (columns) represent the vertexes/edge identifiers that have this property value. In the multiple item model, items in the graphindex are still keyed at the property name/value combinations, but there is a separate item for each vertex/edge that has that property name/value combination. Therefore, in the single item model, you will have one item per property/value combination written to graphindex. In the multiple item model, you will have vertices * vertexProperties + edges * edgeProperties items written to graphindex.

Metrics

Titan uses the Metrics-core package to record and emit metrics. Metrics-core supports reporting metrics over JMX, HTTP, STDOUT, CSV, SLF4j, Ganglia, and Graphite. There are more reporters available as third-party plugins. You can learn more about the Metrics-core package from the Metrics website.

You can turn on Metrics by using the following properties:

metrics.enabled=true

# prefix for metrics from titan-core. Optional. If not specified, com.thinkaurelius.titan will be used.
# Currently, the prefix for Titan system stores (system log, txlog, titan_ids, system_properties, and all user logs)
# is set to com.thinkaurelius.titan.sys and cannot be changed.
metrics.prefix=titan

# polling interval in milliseconds
metrics.csv.interval=500

# the directory where to write metrics in CSV files
metrics.csv.directory=metrics

# The metrics prefix in titan-dynamodb allows you to change what gets prepended to the codahale metric names.
#storage.dynamodb.metrics-prefix=dynamodb

Note

Properties can be set in a properties file in the classpath, directly in the Gremlin shell using a configuration object, or in the rexster.xml file.

To set the metrics configuration properties in the Gremlin shell, type the following:


conf = new BaseConfiguration()
conf.setProperty("metrics.enabled", "true")
conf.setProperty("metrics.prefix", "titan")
conf.setProperty("metrics.csv.interval", 1000)
conf.setProperty("metrics.csv.directory", "metrics")
conf.setProperty("storage.dynamodb.metrics-prefix", "dynamodb")
                        

Metrics core supports a variety of quantity measurements. A Timer is a Meter on the rate and a Histogram on latency of a piece of code. Histograms measure the distribution of a particular value and emit count, max, mean, min, stddev, p50, p75, p95, p98, p99, and p999. Meters measure a call rate (tps) and emit count, mean_rate, m1_rate, m5_rate and m15_rate. Gauges measure a value in a different thread and emit a value. Counters count the number of times a piece of code is called and emit a count.

Titan emits the metrics described in the table on the Titan Metrics page.

The Amazon DynamoDB Storage Backend for Titan emits metrics in addition to those emitted by Titan. They relate to statistics the low-level DynamoDB operations, and are described in the table on the Additional Amazon DynamoDB Storage Backend for Titan Metrics.