Data modeling example

Let’s walk through an example that combines the concepts described in this section and the previous. This example will demonstrate how to use secondary indexes for efficient data access, and how to estimate both the item size and the required capacity for a DynamoDB table. For this example there is a gaming application that tracks players and the games they play. The following figure contains an Entity Relationship (ER) Diagram for the RDBMS schema of this application.

An entity relationship diagram showing how player, game, and playergameitem are connected.

RDBMS schema for gaming application

A player can play multiple games and each game can host many players, so there is a many-to-many relationship between players and games, leading to a PlayersGamesItems association table in RDBMS.

In order to perform data modeling for DynamoDB, we should first list the application’s access patterns, from the most to the least frequently used. In this case the access patterns are:

Table 4 – Application access patterns

	Access pattern	Frequency/sec
1	Get details of a certain game	2000
2	Get details of a certain player	1000
3	Get list of games a player registered to within a certain time range	500
4	Get list of all games a player ever played	150
5	Update the player’s score in a certain game	100
6	Add an association between a player and a game he/she played	50
7	Get list of players who ever played a specific game	1

The following is the DynamoDB table and global secondary index (GSI) structure that will support the above access patterns:

DynamoDB schema for gaming application

In DynamoDB the user utilizes a single table with GameId being its partition key and PlayerId its sort key. This creates a one-to-many relationship between games and players and allows retrieval of data about a specific player or all players playing a particular game. In order to create a many-to-many relationship between games and players, the user creates a GSI with reversed partition and sort key. The user creates a second GSI to support additional access patterns: retrieving a list of games a player played sorted on the registration date or a list of games a player registered to between certain dates. The following figure shows DynamoDB table and GSIs with several added items, as can be seen in the NoSQL Workbench.

DynamoDB table and GSIs in NoSQL Workbench

As the DynamoDB table and GSIs in NoSQL Workbench figure shows, the user denormalized the data. The game name, type, and player name are duplicated in each item in the table. This increases the storage usage, but also enables retrieving these details in a single, fast query, for example when querying for the games Player2 ever played. Depending on the requirements, we could model our data differently. If retrieving the game name and type isn’t required for the majority of our queries, we could store these details in a single root game item, instead of storing these in all the items. Similarly, we could store any generic player related data in root player item. With such data modeling the data would appear as follows:

DynamoDB table and GSIs with alternative data modeling in NoSQL Workbench

With the alternative data modeling, the partition key attribute is called PK and the sort key attribute is called SK. This is because the table is now overloaded with multiple types of objects. There are root game items containing generic game details, root player items containing generic player details, and items that associate players to games they played.

Also note that these different types of items contain different attributes. DynamoDB’s schema-less data model allows us to perform this kind of table overloading and it is considered a best practice in DynamoDB. You might also notice that the PlayerID is now present in both the SK and the PlayerID attributes. This in order to use the PlayerID attribute as the GSI’s sort key, to only include players to games association items in the GSIs, making them sparse indexes. As can be seen, the user prefixes the game IDs with G# and the Player IDs with P# to ensure that game and player IDs are unique.

The next step is to list access patterns and the tables, or GSIs, that serve them. Here is the list with the fully denormalized modeling and the list with alternative modeling:

Table 5 – Access patterns mapping to DynamoDB table and GSIs with fully denormalized modeling

	Access pattern	Read served by	Passed primary key values
1	Get game details for the given `GameId`	Table	GameId=X, limit=1
2	Get player details for the given `PlayerId`	GSI1	PlayerId=X, limit=1
3	Get a list of games a player registered to within a time range, for the given PlayerId and time range	GSI2	PlayerId=X, RegistrationDate between Y and Z
4	Get list of all games a player ever played	GSI1	PlayerId=X
5	Update the player’s score in a certain game	–	GameId=X, PlayerId=Y
6	Add an association between a player and a game he/she played	–	GameId=X, PlayerId=Y
7	Get list of players who ever played a specific game	Table	GameId=X

Table 6 – Access patterns mapping to DynamoDB table and GSIs with alternative modeling

	Access pattern	Read served by	Passed primary key values
1	Get game details for the given game ID passed in as PK	Table	PK=X, SK=root
2	Get player details for the given player ID passed in as PK	Table	PK=X, SK=root
3	Get a list of games a player registered to within a time range, for the given `PlayerId` and time range	GSI2	PlayerId=X, RegistrationDate between Y and Z
4	Get list of all games a player ever played	GSI1	PlayerId=X
5	Update the player’s score in a certain game	–	PK=X, SK=Y
6	Add an association between a player and a game he/she played	–	PK=X, SK=Y
7	Get list of players who ever played a specific game	Table	PK=X

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Data modeling phase

Cost estimate calculation