Data modeling example - Best Practices for Migrating from RDBMS to Amazon DynamoDB

Data modeling example

Let’s walk through an example that combines the concepts described in this section and the previous. This example will demonstrate how to use secondary indexes for efficient data access, and how to estimate both the item size and the required capacity for a DynamoDB table. For this example there is a gaming application that tracks players and the games they play. The following figure contains an Entity Relationship (ER) Diagram for the RDBMS schema of this application.

An entity relationship diagram showing how player, game, and playergameitem are connected.

RDBMS schema for gaming application

A player can play multiple games and each game can host many players, so there is a many-to-many relationship between players and games, leading to a PlayersGamesItems association table in RDBMS.

In order to perform data modeling for DynamoDB, we should first list the application’s access patterns, from the most to the least frequently used. In this case the access patterns are:

Table 4 – Application access patterns

Access pattern Frequency/sec
1 Get details of a certain game 2000
2 Get details of a certain player 1000
3 Get list of games a player registered to within a certain time range 500
4 Get list of all games a player ever played 150
5 Update the player’s score in a certain game 100
6 Add an association between a player and a game he/she played 50
7 Get list of players who ever played a specific game 1

The following is the DynamoDB table and global secondary index (GSI) structure that will support the above access patterns:

is the DynamoDB table and global secondary index (GSI) structure that will support the above access patterns.

DynamoDB schema for gaming application

In DynamoDB the user utilizes a single table with GameId being its partition key and PlayerId its sort key. This creates a one-to-many relationship between games and players and allows retrieval of data about a specific player or all players playing a particular game. In order to create a many-to-many relationship between games and players, the user creates a GSI with reversed partition and sort key. The user creates a second GSI to support additional access patterns: retrieving a list of games a player played sorted on the registration date or a list of games a player registered to between certain dates. The following figure shows DynamoDB table and GSIs with several added items, as can be seen in the NoSQL Workbench.

DynamoDB table for games.
GSI1 table.
GSI2 table.

DynamoDB table and GSIs in NoSQL Workbench

As the DynamoDB table and GSIs in NoSQL Workbench figure shows, the user denormalized the data. The game name, type, and player name are duplicated in each item in the table. This increases the storage usage, but also enables retrieving these details in a single, fast query, for example when querying for the games Player2 ever played. Depending on the requirements, we could model our data differently. If retrieving the game name and type isn’t required for the majority of our queries, we could store these details in a single root game item, instead of storing these in all the items. Similarly, we could store any generic player related data in root player item. With such data modeling the data would appear as follows:

Games_alternative DynamoDB table.
Games_alternative GSI1.
Games_alternative GSI2.

DynamoDB table and GSIs with alternative data modeling in NoSQL Workbench

With the alternative data modeling, the partition key attribute is called PK and the sort key attribute is called SK. This is because the table is now overloaded with multiple types of objects. There are root game items containing generic game details, root player items containing generic player details, and items that associate players to games they played.

Also note that these different types of items contain different attributes. DynamoDB’s schema-less data model allows us to perform this kind of table overloading and it is considered a best practice in DynamoDB. You might also notice that the PlayerID is now present in both the SK and the PlayerID attributes. This in order to use the PlayerID attribute as the GSI’s sort key, to only include players to games association items in the GSIs, making them sparse indexes. As can be seen, the user prefixes the game IDs with G# and the Player IDs with P# to ensure that game and player IDs are unique.

The next step is to list access patterns and the tables, or GSIs, that serve them. Here is the list with the fully denormalized modeling and the list with alternative modeling:

Table 5 – Access patterns mapping to DynamoDB table and GSIs with fully denormalized modeling

Access pattern Read served by Passed primary key values
1 Get game details for the given GameId Table GameId=X, limit=1
2 Get player details for the given PlayerId GSI1 PlayerId=X, limit=1
3 Get a list of games a player registered to within a time range, for the given PlayerId and time range GSI2 PlayerId=X, RegistrationDate between Y and Z
4 Get list of all games a player ever played GSI1 PlayerId=X
5 Update the player’s score in a certain game GameId=X, PlayerId=Y
6 Add an association between a player and a game he/she played GameId=X, PlayerId=Y
7 Get list of players who ever played a specific game Table GameId=X

Table 6 – Access patterns mapping to DynamoDB table and GSIs with alternative modeling

Access pattern Read served by Passed primary key values
1 Get game details for the given game ID passed in as PK Table PK=X, SK=root
2 Get player details for the given player ID passed in as PK Table PK=X, SK=root
3 Get a list of games a player registered to within a time range, for the given PlayerId and time range GSI2 PlayerId=X, RegistrationDate between Y and Z
4 Get list of all games a player ever played GSI1 PlayerId=X
5 Update the player’s score in a certain game PK=X, SK=Y
6 Add an association between a player and a game he/she played PK=X, SK=Y
7 Get list of players who ever played a specific game Table PK=X