Window Functions in Power Query
Introduction
I have been searching for a while on a good way to do the equivalent of a SQL Window Function in Power BI’s Power Query, but I couldn’t find any simple solutions.
For those who don’t know, a SQL Window function would look something like this:
SELECT
EmployeeName,
Department,
Salary,
AVERAGE(Salary) OVER (PARTITION BY Department) AS AverageSalary
From dbo.Employees
Window functions help you run aggregations over the particular partition/window that you would like, while keeping the total amount of rows in place. As opposed to a GROUP BY…
SELECT
Department,
AVERAGE(Salary) AS AverageSalary
FROM dbo.Employees
GROUP BY Department
… where you lose the data at the employee level because it aggregates into one row per department.
In this simple example, using window functions instead of GROUP BY has allowed you to compare each employee’s salary to the average salary in their department.
Window functions are extremely powerful and useful, and I have grown accustomed to using them in SQL and Spark. But, for me at least, the functionality wasn’t as straightforward to figure out in Power Query. So I’m writing this article to show some simple ways to use Power Query to accomplish the same goal as SQL window functions.
This article will show how to calculate:
1. Year-To-Date (YTD) Sales by Department
2. Life-To-Date (LTD) Sales by Department
3. Previous Date’s Sales (Lag) by Department
4. Next Date’s Sales (Lead) by Department
5. Total Sales by Department
6. Row Number by Department to generate a flag column.
Three notes before I begin:
- Yes, I know that if your data source is a SQL database, you can just write the custom SQL query to execute the window functions. But not all data sources will allow custom SQL queries, so this will help with those situations.
- Some of the calculations demonstrated in this article could perhaps be performed more easily in DAX. While that may be true, this article will demonstrate how to accomplish our goal of window functions in Power Query. You can decide what works best for you.
- This YouTube video was incredibly helpful for me on this topic:
Explanation
Sample Data that will be used in each example:
Part 1: Calculate Year-To-Date Sales by Department
Step 1: Add a Year column
Add Column > Custom Column:
We now have a Year column based off the Date column.
Step 2: Group by Department and Year, with All Rows as the operation
Our data now has one row for every Department and Year, but each value in the newly created GroupedSales column is a table:
Step 3: Write custom M formula in Advanced Editor to operate on each table in GroupSales column.
Navigate to Advanced Editor to make these changes:
Our Advanced Editor now looks like this:
What we now need to do is create a function that will operate on each table, and generate the correct Year-To-Date amounts by department.
I will show the function and code that we need, and then show screenshots of what each step is doing to further explain.
Advanced Editor after adding and running the custom YTD function:
As you can see, we added a function called YTDFunction (lines 10–16) and then we called that function (line 19) on each table in the GroupedSales column.
The first line of the function (line 12) is key for this calculation to work. It is sorting the grouped table by ascending dates, so that the YTD calculation will be a running total as the year continues. Adding the RowNumber column is giving us an index that essentially will tell us how many rows of the table to sum in order to get the YTD amount for that row. The oldest date only needs to sum one row, while the latest date needs to sum every row in the table to get the YTD amount.
The second line of the function (line 13) sums up the Sales column by using the common List.Sum function, but we are combining it with the List.Range function that uses the RowNumber column to decide how many rows to sum. The three parameters passed to List.Range are 1) The column that I want to get the data from 2) The starting point for where I will sum (with a 0 starting index for first row) and 3) The length of the range I want (how many rows I want to add together starting from the 0 index).
After running this function, we have transformed the GroupedSales column, while keeping it as a table. If you click a cell in the GroupedSales column, you can see a preview of what the added columns look like to verify everything worked as intended:
We can now remove the Department and Year columns, as the tables in the GroupedSales column have all the columns we need.
Now expand (icon just to the right of GroupedSales column name) the GroupedSales column to reveal the updated table with the YTD calculation added:
Click OK and you have the final result:
Rather than using a window function in SQL to calculate YTD Sales, we have successfully used Power Query to accomplish the same goal.
Part 2: Calculate Life-To-Date Sales by Department
Note: We won’t show as many screenshots for this section as most of the steps are similar to calculating YTD Sales.
Step 1: Group by Department, with All Rows as the operation. We don’t need a Year column because it’s not needed for LTD calculations.
We now have a similar result to YTD, except now we’ve only grouped by Department, not Department and Year.
Now we are going to do basically the same thing we did in the YTD section. Create an LTD function that generates a RowNumber in ascending order of dates, and sums based on a range derived from that RowNumber.
We have accomplished the same thing we did for YTD. Generate a RowNumber column in ascending order by Date that will be used for List.Range. After running the function and expanding the GroupedSales column just like in YTD section, we have our final result for LTD:
Once again, rather than using a window function in SQL to calculate LTD Sales, we have successfully used Power Query to accomplish the same goal.
Part 3: Calculate Previous Date’s Sales by Department
Just like in the LTD example, we are going to group our initial data by Department:
Just like in previous sections, we are going to create a custom function that calculates the lag, or previous date’s sales. We are going to add a RowNumber column that will be used as index to get the previous date’s value. If you consider the Sales column sorted by ascending dates as a list, with the first row starting at index 0, then in order to get the previous date’s sales, the second row would need to grab the value for index 0 (first row), the third row would need to grab the value for index 1 (second row), and so on. Considering that, here is what our lag function would look like:
For the first part of the function (line 11) we are sorting the table by ascending Date, starting with 0, and incrementing by 1 for our LagRowNumber column.
For the second part of the function (line 12), we are getting the value of the Sales column for LagRowNumber minus 1 so that we can get the previous date’s value based on its index. The other key part to notice is that we are establishing this value as a list of length 1 (the last parameter in the List.Range function).
After running the CalculateLagFunction, we can see the preview of the data:
One thing to notice is that the first row is null. This is because there is no previous date for that row, so its previous date’s sales value doesn’t exist. The other thing to notice is that each value is a list, so we can’t actually see the values yet. That is fine, as we will expand it later. We also know there won’t be any problems doing that because we specified the length of each list to be 1, so we don’t have to worry about multiple values in a list.
When we expand the GroupedSales column, it looks like this:
In our YTD and LTD examples, we were done at this step. However, we have one additional step here which is to expand the PreviousDateSales column so that the values aren’t lists anymore. We accomplish this by selecting Expand > Expand to New Rows.
Now, we have our final result:
Once again, rather than using a window function in SQL to calculate Previous Date’s Sales, we have successfully used Power Query to accomplish the same goal.
Part 4: Calculate Next Date’s Sales by Department
Almost all of the steps in this one are the same as calculation the Previous Date’s Sales by Department in Part 3, but with just a couple of tweaks to the function.
After grouping by department, same as in Part 3, here is what our CalculateLeadFunction looks like:
Because we are looking for the next date’s sales, we don’t need the previous row’s index number, but the next row’s index number. So two main things have changed in this CalculateLeadFunction from the CalculateLagFunction: 1) The index column starts at RowNumber 1 instead of 0 and 2) In our List.Range formula, we just take the index number as it is, without subtracting 1.
After expanding the GroupedSales column and expanding the lists as we did in Part 3, we have our final result:
Notice this time that the last date’s row for a Department is null, since we are looking for the next date’s row, instead of the previous date’s row.
Once again, rather than using a window function in SQL to calculate Next Date’s Sales, we have successfully used Power Query to accomplish the same goal.
Part 5: Calculate Total Sales by Department
This example will be slightly different, as we don’t have to sort the data by ascending Date. We just want a column that shows the total sales by department.
We will group by department only, similar to previous examples.
We will create a TotalSalesFunction, similar to previous examples, but the logic is more straightforward as we just want to sum all Sales for the grouping:
After running this, we can see the preview for what each grouping looks like:
After expanding the GroupedSales column, we now have our final result:
Once again, rather than using a window function in SQL to calculate Total Sales, we have successfully used Power Query to accomplish the same goal.
Part 6: Calculate Descending RowNumber to Generate a Flag Column
If you have read this far in the article, you probably made it farther than most.
One common usage for Window functions is to generate a RowNumber column. I’ve done this before when I want to know the latest record for a particular window/partition, so that I can create a True/False flag based on the row number. Let’s look at an example of how to do that.
Just as we’ve done in most examples in this article, we will group by Department:
I now want to generate a RowNumber column that will sort the data in descending Date order, so that the most recent date for a department will be RowNumber 1, the next recent date is 2, and so on. Here is what logic will look like:
The preview of the DescRowNumber looks like this, with the most recent date being 1:
Expanding the GroupedSales column leaves us with a final result like this:
However, to create our True/False flag to indicate if the row is the latest row for a department, we can create a conditional column:
Now we have our final result:
The DescRowNumber column can be removed, but I left it in there just for explanation purposes. Now we could filter IsLatestSalesRecord to True if we only wanted to see the latest Sales by Department.
Conclusion
Congratulations if you made it this far. I hope this helps you in working with Power Query and making all the data transformations that you need.
Thanks for reading!