AWS AppSync
AWS AppSync Developer Guide

Tutorial: Delta Sync

Client applications in AWS AppSync store data by caching GraphQL responses locally to disk in a mobile/web application. An example of this architecture is shown here and is the same for other platforms. The Delta Sync feature gives customers the ability to specify two separate queries in the sync process – a base query and a delta query. This allows clients to hydrate their local cache with results from one base query that might have a lot of records, and then receive only the data altered since their last query (the delta updates). By allowing clients to separate the base hydration of the cache with one query and incremental updates in another query, you can move the computation from your client application to the backend. This is substantially more efficient on the clients when regularly switching between online and offline states.

To implement Delta Sync, both queries may use the same data source, or they may use separate data sources. If you want to have both queries use the same data source, with just different query or filtering options that is completely fine because as the backend is in your control. However, an optimal pattern when you start to reach larger scale is to use Delta Sync in conjunction with Pipeline resolvers. This enables you to partition the items available for a delta query in a separate data source, essentially creating a journal of events. In this scenario, one data source is optimized to service the base query, and the other data source is optimized to query for incremental updates. For DynamoDB, a base query may run a Scan on one table and then a Query with a conditional expression on a second table to pull changed events.

In addition, Delta Sync clients can also receive a subscription as an argument, and then the client coordinates subscription reconnects and writes between offline to online transitions. Delta Sync performs this by automatically resuming subscriptions, including exponential backoff and retry with jitter through different network error scenarios, and storing events in a queue. The appropriate delta or base query is then run before merging any events from the queue, and finally processing subscriptions as normal.

Documentation for client configuration options is available on the Amplify Framework website. This documentation outlines how to setup a backend using pipeline resolvers to work with the Delta Sync client for optimal data access.

One-Click Setup

To automatically set up the GraphQL endpoint in AWS AppSync with all the resolvers configured and the necessary AWS resources, use the following AWS CloudFormation template:

This stack creates the following resources in your account:

  • 2 DynamoDB tables (Base and Delta)

  • 1 AWS AppSync API with API key

  • 1 IAM Role with policy for DynamoDB tables

  • DynamoDB TTL configured on expdate attribute of Delta table

Two tables are used to partition your delta queries into a second table that acts as a journal of missed events when the clients were offline. To keep the queries efficient on the delta table, you can use Amazon DynamoDB TTLs to automatically groom the events as necessary. The TTL time is configurable for your needs in the second function defined for each mutation in the pipeline resolver (you might want this as 1hour, 1day, etc.).

Schema

To demonstrate Delta Sync the sample application creates a Posts schema backed by a Base and Delta table in DynamoDB. The mutations use pipeline resolvers with two functions each to enable writing to both tables. The queries pull records from the Base or Delta table as appropriate, and a single subscription is defined to show how clients can leverage this in their reconnection logic.

input CreatePostInput { author: String! title: String! content: String! url: String ups: Int downs: Int } enum DeltaAction { DELETE } type Mutation { createPost(input: CreatePostInput!): Post updatePost(input: UpdatePostInput!): Post deletePost(id: ID!): Post } type Post { id: ID! author: String! title: String! content: String! url: AWSURL ups: Int downs: Int createdDate: AWSDateTime aws_ds: DeltaAction } type Query { getPost(id: ID!): Post listPosts: [Post] listPostsDelta(lastSync: AWSTimestamp): [Post] } type Subscription { onDeltaPost: Post @aws_subscribe(mutations: ["createPost","updatePost","deletePost"]) } input UpdatePostInput { id: ID! author: String title: String content: String url: String ups: Int downs: Int } schema { query: Query mutation: Mutation subscription: Subscription }

The GraphQL schema is standard but a couple things are worth calling out before moving forward. First, all of the mutations use Pipeline resolvers with functions defined that first write to the Base table and then to the Delta table. The Base table is the central source of truth for state while the Delta table is your journal. The listPosts query runs against the Base table and hydrates the cache as well as running at periodic times as a global catchup process for edge cases when clients are offline longer than your configured TTL time in the Delta table. The listPostsDelta query runs against your Delta table and is used by clients to retrieve changed events since they were last offline. Clients automatically pass the lastSync: AWSTimestamp value, as well as manage it in the SDK and persist to disk appropriately. You do not need to specify this argument in your client code.

The aws_ds field on Post is used for DELETE operations. When clients are offline and records are removed from the Base table, this attribute notifies clients performing synchronization to evict items from their local cache. In cases where clients are offline for longer periods of time and the item has been removed before the client can retrieve this value with a Delta Sync query, the global catch-up event in the base query (configurable in the client) runs and removes the item from the cache. This field is marked optional because it only returns a value when running a delta query which has deleted items present.

Mutations

As outlined, mutations use pipeline resolvers with two functions. For all of the mutations, the first function does a standard Create/Update/Delete operation in the Base table. However, the second function operates different in each case:

  • createPost second function (createDelta): Creates a record for the new item in the Delta table that matches the item in the Base table with two additional attributes, timestamp and expdate. The function calculates the current $timestamp that was used to write to the Base table (for consistency across tables). This is written to the Delta table for the listPostsDelta query to use. The function also calculates a $exp value which is stored in the Delta table for DynamoDB TTL grooming (required to be in seconds).

  • updatePost second function (updateDelta): Updates the record in the Delta table in the same manner as an update to the Base table. A new timestamp and expdate are calculated for clients to receive the changes.

  • deletePost second function (deleteDelta): Writes the last known data as well as updated timestamp and expdate values for clients to retrieve changes. Also, creates an attribute called aws_ds with a value of DELETE. AppSync clients can filter on this and remove records from their local cache if they were deleted while offline.

You can reduce or extend the time to keep records by modifying the $delta value in the function. For organizations with a high velocity of data, it may make sense to keep this short. Alternatively, if your clients are offline for longer periods of time it might be prudent to keep this longer.

Following is a diagram of createPost execution flow and the two functions:

Base Query

The base query is a standard DynamoDB list operation. However, it's important to remember this is configurable and you should evaluate your needs and DynamoDB best practices around table design and Scan/Query patterns. For many organizations this works because the base query only runs on startup and at a periodic basis thereafter. If your data access patterns or number of clients are larger, you might want to change the data access pattern and we recommend consulting the DynamoDB best practices documentation. As a benefit of GraphQL, the AppSync client doesn't have knowledge of optimizations you do in the backend data fetching as it simply caches the GraphQL type results as appropriate. Following is a diagram of the base query.

Delta Query

The delta query executes whenever the client comes back online from an offline state (as long as the base query periodic time hasn't triggered to run). Clients automatically track the last time they successfully ran a query to sync data, including any messages received from a GraphQL Subscription if you have configured that in your client. The GraphQL subscription is sent as an argument to the configured delta query in your schema so it must have a lastSync: AWSTimestamp argument defined.

When a delta query is run, the query's resolver retrieves all the records, and it filters them since the last time the client performed a sync (by a timestamp) and whether the expiration time has passed. The second condition is in place because DynamoDB performs TTL operations in batch, and records might be pending eviction. As stated earlier, this is just one example of how you might list the delta items. For some organizations a different DynamoDB table schema, index, or Query might make sense. For more information, see the DynamoDB best practices documentation. The client caches the appropriate GraphQL response.