For data analyst freelance services contact me [email protected]
Explanation of Window Functions
Window functions (also called analytical functions) are an extension of the standard SQL language and can perform calculations on a large number of rows (the calculation itself is performed by an internal and hidden query of the database engine). These functions return a single value that corresponds to each row in the query we wrote.
The functions are called ‘window’ because the window defines the range of rows in the calculation. The window of the function is defined by the Over command.
On the rows within the window, we can define groups by the “Partition BY” command and even specify their order the with “Order By” command. We can also narrow the window range by the commands: rows, preceding, following.
Example:
select
orders.*,
sum(sum_order) over () as sum_of_all_orders,
sum(sum_order) over (partition by customer_id) as sum_of_customer_orders,
max(sum_order) over (partition by country) as max_order_at_country,
avg(sum_order) over (partition by country) as avg_order_at_country,
case when sum_order > avg(sum_order) over (partition by country) then 1 else 0 end as order_is_above_country_avg
from
orders
Explanation
sum(sum_order) over () as sum_of_all_orders
In this function, there are no values inside the Over command and therefore the size of the window is not limited. Since there is no limit to the size of the window, the calculation of the sum function will refer to all the rows in the table, In other words, the purchase total will appear in all rows.
sum(sum_order) over (partition by customer_id) as sum_of_customer_orders
This function is similar to the previous one, but in this case, we divided the window into groups of customers according to the Customer_id (in fact, we performed an aggregation operation by Customer_id). Therefore, in each row, we will get the sum of purchases of the customer.
Types of window functions
Aggregate functions (we saw examples in the previous section):
Count, Avg, Sum, Min, Max, etc. – These functions are used for adding aggregate calculations on numbers.
Values
Lag\Lead – Return the value of the previous or next row in the table’s output.
First_value\Last_value – Return the first or last value of the window.
Rank functions
Rank/Row_Number – Used to rank numbers in a specific order.
Percentile – Used to determine percentile values.
Qualify
There are databases that allow using window function filtering. This instruction is called Qualify. In this example, filtering will be performed on all orders of customers whose total purchases are above $1,000.
select
orders.*,
sum(sum_order) over (partition by client_id) as sum_of_all_orders
from
orders
qualify
sum(sum_order) over (partition by client_id)>1000
This article was written by Yuval Marnin.
If you have a need to hire a freelancer data analyst you may contact me at: [email protected]
You may also hire me through upwork platform on that link:
https://www.upwork.com/freelancers/~018940225ce48244f0\
Further reading
The advantage of hiring a freelance data analyst.
What does a data analyst is doing and how it can help your company.