Monday, October 26, 2015

Cohort Analysis made easy

Cohort analytics can be defined as groups of users with similar characteristics over time and can be a great indicator for product effectiveness and user engagement.


Some Examples:
  • Percentage of users who signup that remain active over time
  • Customers by cohort over time
  • Revenues by cohort over time
Thomas Tunguz from Redpoint has written some great blog posts on cohort analytics and how to interpret them effectively.


Cohort Analysis can be notoriously difficult to perform within the database. In this blog post, we’ll take a step-by-step guide to generating cohort analysis using Cloud9 Charts to go from raw data to the following cohort heatmap grid in a few easy steps.





The example focuses on signups that are active over a period of time. (For cohort analysis from event based data, see our documentation.)


Following is a sample of the data that we'll be working with. Assume that the data contains two fields that can be obtained using a database query: a registration date, when the user first signed up for a service and a transaction date when the user performed a transaction. For example:
registration_date
transaction_date
07/01/2014 18:01:16 PDT
07/01/2014 18:02:18 PDT
07/01/2014 18:03:50 PDT
07/01/2014 18:06:06 PDT
07/02/2014 01:43:30 PDT
07/02/2014 01:45:50 PDT
07/02/2014 01:56:27 PDT
07/02/2014 01:57:40 PDT
….
….

Our objective here to determine the monthly transactions cohorts based on the signup month.

Calculating cohorts is as simple as applying the Cloud9QL below. In case you are unfamiliar, Cloud9QL is a post processor with SQL-like familiarity that be used for analytics post processing on your data on top of your database queries.


select month(transaction_date) as transaction_date, month(registration_date) as registration_date;

select cohort(transaction_date, registration_date as Cohort Date, 1m as Cohort Period), count(*) as Total  group by Cohort Date, Cohort Period  order by Cohort Date desc, Cohort Period asc;

select date_format(cohort date,MMM-yy) as cohort date, cohort period, total; 
Here we are chaining three Cloud9QL statements, where the results of the first are passed into the second and so on. To take a deeper dive:

select month(transaction_date) as transaction_date, month(registration_date) as registration_date;
This truncates the dates to the start of the month, since we are interested in a monthly cohort. The second statement applies the Cohort function that counts the results for n months since each signup month:


select cohort(transaction_date, registration_date as Cohort Date, 1m as Cohort Period), count(*) as Total  group by Cohort Date, Cohort Period  order by Cohort Date desc, Cohort Period asc;
The last query in the chain adjusts the date format for display purposes:

select date_format(cohort date,MMM-yy) as cohort date, cohort period, total; 
Last Step: Change the visualization type to a heat map grid.




In a few simple steps, we've obtained a visual cohort grouping of your data. If you need help with determining cohorts for your own data, please feel free to contact us.