Data modeling example
Let’s walk through an example that combines the concepts described in this section and the previous. This example will demonstrate how to use secondary indexes for efficient data access, and how to estimate both the item size and the required capacity for a DynamoDB table. For this example there is a gaming application that tracks players and the games they play. The following figure contains an Entity Relationship (ER) Diagram for the RDBMS schema of this application.
A player can play multiple games and each game can host many players, so there is a many-to-many relationship between players and games, leading to a PlayersGamesItems association table in RDBMS.
In order to perform data modeling for DynamoDB, we should first list the application’s access patterns, from the most to the least frequently used. In this case the access patterns are:
Table 4 – Application access patterns
Access pattern | Frequency/sec | |
---|---|---|
1 | Get details of a certain game | 2000 |
2 | Get details of a certain player | 1000 |
3 | Get list of games a player registered to within a certain time range | 500 |
4 | Get list of all games a player ever played | 150 |
5 | Update the player’s score in a certain game | 100 |
6 | Add an association between a player and a game he/she played | 50 |
7 | Get list of players who ever played a specific game | 1 |
The following is the DynamoDB table and global secondary index (GSI) structure that will support the above access patterns:
In DynamoDB the user utilizes a single table with GameId
being its partition
key and PlayerId
its sort key. This creates a one-to-many relationship between
games and players and allows retrieval of data about a specific player or all players playing
a particular game. In order to create a many-to-many relationship between games and players,
the user creates a GSI with reversed partition and sort key. The user creates a second GSI to
support additional access patterns: retrieving a list of games a player played sorted on the
registration date or a list of games a player registered to between certain dates. The
following figure shows DynamoDB table and GSIs with several added items, as can be seen in the
NoSQL Workbench.
As the DynamoDB table and GSIs in NoSQL Workbench figure shows, the user denormalized the data. The game name, type, and player name are duplicated in each item in the table. This increases the storage usage, but also enables retrieving these details in a single, fast query, for example when querying for the games Player2 ever played. Depending on the requirements, we could model our data differently. If retrieving the game name and type isn’t required for the majority of our queries, we could store these details in a single root game item, instead of storing these in all the items. Similarly, we could store any generic player related data in root player item. With such data modeling the data would appear as follows:
With the alternative data modeling, the partition key attribute is called PK and the sort key attribute is called SK. This is because the table is now overloaded with multiple types of objects. There are root game items containing generic game details, root player items containing generic player details, and items that associate players to games they played.
Also note that these different types of items contain different attributes. DynamoDB’s schema-less data model allows us to perform this kind of table overloading and it is considered a best practice in DynamoDB. You might also notice that the PlayerID is now present in both the SK and the PlayerID attributes. This in order to use the PlayerID attribute as the GSI’s sort key, to only include players to games association items in the GSIs, making them sparse indexes. As can be seen, the user prefixes the game IDs with G# and the Player IDs with P# to ensure that game and player IDs are unique.
The next step is to list access patterns and the tables, or GSIs, that serve them. Here is the list with the fully denormalized modeling and the list with alternative modeling:
Table 5 – Access patterns mapping to DynamoDB table and GSIs with fully denormalized modeling
Access pattern | Read served by | Passed primary key values | |
---|---|---|---|
1 | Get game details for the given GameId |
Table | GameId=X, limit=1 |
2 | Get player details for the given PlayerId |
GSI1 | PlayerId=X, limit=1 |
3 | Get a list of games a player registered to within a time range, for the given PlayerId and time range | GSI2 | PlayerId=X, RegistrationDate between Y and Z |
4 | Get list of all games a player ever played | GSI1 | PlayerId=X |
5 | Update the player’s score in a certain game | – | GameId=X, PlayerId=Y |
6 | Add an association between a player and a game he/she played | – | GameId=X, PlayerId=Y |
7 | Get list of players who ever played a specific game | Table | GameId=X |
Table 6 – Access patterns mapping to DynamoDB table and GSIs with alternative modeling
Access pattern | Read served by | Passed primary key values | |
---|---|---|---|
1 | Get game details for the given game ID passed in as PK | Table | PK=X, SK=root |
2 | Get player details for the given player ID passed in as PK | Table | PK=X, SK=root |
3 | Get a list of games a player registered to within a time range, for the given
PlayerId and time range |
GSI2 | PlayerId=X, RegistrationDate between Y and Z |
4 | Get list of all games a player ever played | GSI1 | PlayerId=X |
5 | Update the player’s score in a certain game | – | PK=X, SK=Y |
6 | Add an association between a player and a game he/she played | – | PK=X, SK=Y |
7 | Get list of players who ever played a specific game | Table | PK=X |