Summary
When analyzing product events analysys needs to add session numbers to events in SQL to analyze the product activity. In this article I explain how we can add sessions to events with SQL.
Introduction
When analyzing product events analysys needs session numbers of the users to analyze the product activity. Most of the time developers add the session number to the events they are sending to the database for every action the users do, such events can be in JSON format or rows in a table.
Nevertheless, there are cases in which the developers do not send session numbers with the events and then the analysts need to add session numbers to events in SQL with the help of windows function.
What is a session?
The term session refers to the period of time that the user has used the product without stopping. The session begins when the user logs in the product to perform actions on and ends when the user leaves it. If the user has not performed actions on the product in a certain period of time (for example 30 minutes) we know that the session is over.
Adding sessions to events with SQL
For the purpose of the example we will define a simple hypothetical table with the following fields:
- User_id- Username
- ts – time of the event (in minutes)
- event_action – the action performed by the user
On this hypothetical table we can run the following SQL code the create the session:
select
tab.*
,sum(ind_new_session) over
(
partition by user_id order by ts asc
rows between unbounded preceding and current row
) as session_num_for_user
,sum(ind_new_session)
over (order user_id, ts) as session_id
from
(
select
user_id
,action
,ts
,lag (ts,1) over (partition by user_id order by ts asc) as prev_event_ts
,case when (event_ts – prev_event_ts) >= 30 or prev_event_ts is null then 1 else 0
end
as ind_new_session
from
events
) as sessions
Explanation
In the sub-table (called sessions) I added a field called ind_new_session which indicates the beginning of the session using the LAG function. The LAG function helps me to get the date of the user’s next event and with that I can calculate the length of time that elapsed between the events. If 30 minutes have passed (or there the users did not have events before) then there is a new session.
In the outer table I compute two types of sessions:
- session_num_for_user – the session number in ascending order of the user (I summed with the help of cumulative sum the field I created in the subquery – ind_new_session)
- session_id – the number of sessions regardless of the user.
Note – For calculation of the time elapsed between the events I used the formula (event_ts – prev_event_ts). In each database the calculation of the time elapsed between two time variables can be different, so you must check how to calculate the length of time that elapses between events in the database you are working on.
Yuval Marnin,
Data Analyst.