Retention report - Clickstream Analytics on AWS

Retention report

You can use the Retention report to get insights into how frequently and for how long users engage with your website or mobile app after their first visit. The report helps you understand how well your app is doing in terms of attracting users back after their fisrt visit.

Note

This article describes the default report. You can customize the report by applying filters or comparisons or by changing the dimensions, metrics, or charts in QuickSight. For more information, refer to Visualizing data in Amazon QuickSight.

View the report

  1. Access the dashboard for your application. Refer to Access dashboard

  2. In the dashboard, choose the sheet with name of Retention.

Where the data comes from

Retention report are created based on the following QuickSight dataset:

  • lifecycle_weekly_view-<app id>-<project id>, which connects to the clickstream_lifecycle_weekly_view_v1 view in analytics engines (that is, Redshift or Athena).

  • lifecycle_daily_view-<app id>-<project id> , which connects to the clickstream_lifecycle_daily_view_v1 view in analytics engines (that is, Redshift or Athena).

  • retention_view-<app id>-<project id> that connects to the clickstream_retention_view_v1 view in analytics engines

Below is the SQL command that generates the view.

Redshift SQL:

clickstream_lifecycle_weekly_view_v1.sql

select user_pseudo_id, DATE_TRUNC('week', dateadd(ms,event_timestamp, '1970-01-01')) as time_period from {{schema}}.event where event_name = '_session_start' group by 1,2 order by 1,2), -- detect if lag and lead exists lag_lead as ( select user_pseudo_id, time_period, lag(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period), lead(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period) from weekly_usage), -- calculate lag and lead size lag_lead_with_diffs as ( select user_pseudo_id, time_period, lag, lead, datediff(week,lag,time_period) lag_size, datediff(week,time_period,lead) lead_size from lag_lead), -- case to lifecycle stage calculated as (select time_period, case when lag is null then '1-NEW' when lag_size = 1 then '2-ACTIVE' when lag_size > 1 then '3-RETURN' end as this_week_value, case when (lead_size > 1 OR lead_size IS NULL) then '0-CHURN' else NULL end as next_week_churn, count(distinct user_pseudo_id) from lag_lead_with_diffs group by 1,2,3) select time_period, this_week_value, sum(count) from calculated group by 1,2 union select time_period+7, '0-CHURN', -1*sum(count) from calculated where next_week_churn is not null group by 1,2;

clickstream_lifecycle_daily_view_v1.sql

select user_pseudo_id, DATE_TRUNC('day', dateadd(ms,event_timestamp, '1970-01-01')) as time_period from {{schema}}.event where event_name = '_session_start' group by 1,2 order by 1,2), -- detect if lag and lead exists lag_lead as ( select user_pseudo_id, time_period, lag(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period), lead(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period) from daily_usage), -- calculate lag and lead size lag_lead_with_diffs as ( select user_pseudo_id, time_period, lag, lead, datediff(day,lag,time_period) lag_size, datediff(day,time_period,lead) lead_size from lag_lead), -- case to lifecycle stage calculated as (select time_period, case when lag is null then '1-NEW' when lag_size = 1 then '2-ACTIVE' when lag_size > 1 then '3-RETURN' end as this_day_value, case when (lead_size > 1 OR lead_size IS NULL) then '0-CHURN' else NULL end as next_day_churn, count(distinct user_pseudo_id) from lag_lead_with_diffs group by 1,2,3) select time_period, this_day_value, sum(count) from calculated group by 1,2 union select time_period+1, '0-CHURN', -1*sum(count) from calculated where next_day_churn is not null group by 1,2;

clickstream_retention_view_v1.sql

WITH user_first_date AS ( SELECT user_pseudo_id, min(event_date) as first_date FROM {{schema}}.event GROUP BY user_pseudo_id ), retention_data AS ( SELECT user_pseudo_id, first_date, DATE_DIFF('day', first_date, event_date) AS day_diff FROM {{schema}}.event JOIN user_first_date USING (user_pseudo_id) ), retention_counts AS ( SELECT first_date, day_diff, COUNT(DISTINCT user_pseudo_id) AS returned_user_count FROM retention_data WHERE day_diff <= 42 -- Calculate retention rate for the last 42 days GROUP BY first_date, day_diff ), total_users AS ( SELECT first_date, COUNT(DISTINCT user_pseudo_id) AS total_users FROM user_first_date group by 1 ), retention_rate AS ( SELECT first_date, day_diff, returned_user_count, total_users FROM retention_counts join total_users using(first_date) ) SELECT * FROM retention_rate;

Athena SQL:

clickstream-lifecycle-weekly-query.sql

with weekly_usage as ( select user_pseudo_id, DATE_TRUNC('week', event_date) as time_period from {{database}}.{{eventTable}} where partition_app = ? and partition_year >= ? and partition_month >= ? and partition_day >= ? and event_name = '_session_start' group by 1,2 order by 1,2 ), lag_lead as ( select user_pseudo_id, time_period, lag(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period) as lag, lead(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period) as lead from weekly_usage ), lag_lead_with_diffs as ( select user_pseudo_id, time_period, lag, lead, date_diff('week',lag,time_period) lag_size, date_diff('week',time_period,lead) lead_size from lag_lead ), calculated as ( select time_period, case when lag is null then '1-NEW' when lag_size = 1 then '2-ACTIVE' when lag_size > 1 then '3-RETURN' end as this_week_value, case when (lead_size > 1 OR lead_size IS NULL) then '0-CHURN' else NULL end as next_week_churn, count(distinct user_pseudo_id) as cnt from lag_lead_with_diffs group by 1,2,3 ) select time_period, this_week_value, sum(cnt) as cnt from calculated group by 1,2 union select date_add('day', 7, time_period), '0-CHURN', -1*sum(cnt) as cnt from calculated where next_week_churn is not null group by 1,2

clickstream-lifecycle-daily-query.sql

with daily_usage as ( select user_pseudo_id, DATE_TRUNC('day', event_date) as time_period from {{database}}.{{eventTable}} where partition_app = ? and partition_year >= ? and partition_month >= ? and partition_day >= ? and event_name = '_session_start' group by 1,2 order by 1,2 ), lag_lead as ( select user_pseudo_id, time_period, lag(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period) as lag, lead(time_period,1) over (partition by user_pseudo_id order by user_pseudo_id, time_period) as lead from daily_usage ), lag_lead_with_diffs as ( select user_pseudo_id, time_period, lag, lead, date_diff('day',lag,time_period) lag_size, date_diff('day',time_period,lead) lead_size from lag_lead ), calculated as ( select time_period, case when lag is null then '1-NEW' when lag_size = 1 then '2-ACTIVE' when lag_size > 1 then '3-RETURN' end as this_day_value, case when (lead_size > 1 OR lead_size IS NULL) then '0-CHURN' else NULL end as next_day_churn, count(distinct user_pseudo_id) as cnt from lag_lead_with_diffs group by 1,2,3 ) select time_period, this_day_value, sum(cnt) as cnt from calculated group by 1,2 union select date_add('day', 1, time_period) as time_period, '0-CHURN', -1*sum(cnt) as cnt from calculated where next_day_churn is not null group by 1,2;

clickstream-lifecycle-daily-query.sql

with base as ( select * from {{database}}.{{eventTable}} where partition_app = ? and partition_year >= ? and partition_month >= ? and partition_day >= ? ), user_first_date AS ( SELECT user_pseudo_id, min(event_date) as first_date FROM base GROUP BY user_pseudo_id ), retention_data AS ( SELECT user_pseudo_id, first_date, DATE_DIFF('day', first_date, event_date) AS day_diff FROM base JOIN user_first_date USING (user_pseudo_id) ), retention_counts AS ( SELECT first_date, day_diff, COUNT(DISTINCT user_pseudo_id) AS returned_user_count FROM retention_data WHERE day_diff <= 42 -- Calculate retention rate for the last 42 days GROUP BY first_date, day_diff ), total_users AS ( SELECT first_date, COUNT(DISTINCT user_pseudo_id) AS total_users FROM user_first_date group by 1 ), retention_rate AS ( SELECT first_date, day_diff, returned_user_count, total_users FROM retention_counts join total_users using(first_date) ) SELECT * FROM retention_rate

Dimensions and metrics

The report includes the following dimensions and metrics. You can add more dimensions or metrics by creating calculated field in QuickSight dateset. For more information, refer to Adding calculated fields

Field Type Descrption Data source
Daily Active User (DAU) Metric Number of active users per date QuickSight aggregation
Weekly Active User (WAU) Metric Number of active users in last 7 days Calculated field in QuickSight
Monthly Active User (MAU) Metric Number of active users in last 30 days Calculated field in QuickSight
user_pseudo_id Dimension A SDK-generated unique id for the user Query from analytics engine
user_id Dimension The user ID set via the setUserId API in SDK Query from analytics engine
DAU/WAU Metric DAU/WAU % for user stickiness Calculated field in QuickSight
WAU/MAU Metric WAU/MAU % for user stickiness Calculated field in QuickSight
DAU/MAU Metric DAU/MAU % for user stickiness Calculated field in QuickSight
Event User Type Dimension The type of user performed the event, that is, new user or existing user Calculated field in QuickSight
User first touch date Metric The first date that a user use your websites or apps Calculated field in QuickSight
Retention rate Metric Distinct active users number / Distinct active user number by User first touch date Calculated field in QuickSight
time_period Dimension The week or day for the user lifecycle Query from analytics engine
this_week_value Dimension The user lifecycle stage, that is, New, Active, Return, and Churn Query from analytics engine
this_day_value Dimension The user lifecycle stage, that is, New, Active, Return, and Churn Query from analytics engine

Sample dashboard

Below image is a sample dashboard for your reference.