Window Functions in Power Query

Kyle Gibson
10 min readSep 27, 2022

--

Introduction

I have been searching for a while on a good way to do the equivalent of a SQL Window Function in Power BI’s Power Query, but I couldn’t find any simple solutions.

For those who don’t know, a SQL Window function would look something like this:

SELECT
EmployeeName,
Department,
Salary,
AVERAGE(Salary) OVER (PARTITION BY Department) AS AverageSalary
From dbo.Employees

Window functions help you run aggregations over the particular partition/window that you would like, while keeping the total amount of rows in place. As opposed to a GROUP BY…

SELECT
Department,
AVERAGE(Salary) AS AverageSalary
FROM dbo.Employees
GROUP BY Department

… where you lose the data at the employee level because it aggregates into one row per department.

In this simple example, using window functions instead of GROUP BY has allowed you to compare each employee’s salary to the average salary in their department.

Window functions are extremely powerful and useful, and I have grown accustomed to using them in SQL and Spark. But, for me at least, the functionality wasn’t as straightforward to figure out in Power Query. So I’m writing this article to show some simple ways to use Power Query to accomplish the same goal as SQL window functions.

This article will show how to calculate:
1. Year-To-Date (YTD) Sales by Department
2. Life-To-Date (LTD) Sales by Department
3. Previous Date’s Sales (Lag) by Department
4. Next Date’s Sales (Lead) by Department
5. Total Sales by Department
6. Row Number by Department to generate a flag column.

Three notes before I begin:

  1. Yes, I know that if your data source is a SQL database, you can just write the custom SQL query to execute the window functions. But not all data sources will allow custom SQL queries, so this will help with those situations.
  2. Some of the calculations demonstrated in this article could perhaps be performed more easily in DAX. While that may be true, this article will demonstrate how to accomplish our goal of window functions in Power Query. You can decide what works best for you.
  3. This YouTube video was incredibly helpful for me on this topic:

Running totals by subcategory in Power Query!! — YouTube

Explanation

Sample Data that will be used in each example:

Sample sales data

Part 1: Calculate Year-To-Date Sales by Department

Step 1: Add a Year column

Add Column > Custom Column:

Adding custom column
Adding Year custom column

We now have a Year column based off the Date column.

Table that now shows the Year custom column

Step 2: Group by Department and Year, with All Rows as the operation

Group By option
Grouping by Date and Year

Our data now has one row for every Department and Year, but each value in the newly created GroupedSales column is a table:

Newly grouped data by department and year

Step 3: Write custom M formula in Advanced Editor to operate on each table in GroupSales column.

Navigate to Advanced Editor to make these changes:

Advanced editor option

Our Advanced Editor now looks like this:

What we now need to do is create a function that will operate on each table, and generate the correct Year-To-Date amounts by department.

I will show the function and code that we need, and then show screenshots of what each step is doing to further explain.

Advanced Editor after adding and running the custom YTD function:

As you can see, we added a function called YTDFunction (lines 10–16) and then we called that function (line 19) on each table in the GroupedSales column.

The first line of the function (line 12) is key for this calculation to work. It is sorting the grouped table by ascending dates, so that the YTD calculation will be a running total as the year continues. Adding the RowNumber column is giving us an index that essentially will tell us how many rows of the table to sum in order to get the YTD amount for that row. The oldest date only needs to sum one row, while the latest date needs to sum every row in the table to get the YTD amount.

The second line of the function (line 13) sums up the Sales column by using the common List.Sum function, but we are combining it with the List.Range function that uses the RowNumber column to decide how many rows to sum. The three parameters passed to List.Range are 1) The column that I want to get the data from 2) The starting point for where I will sum (with a 0 starting index for first row) and 3) The length of the range I want (how many rows I want to add together starting from the 0 index).

After running this function, we have transformed the GroupedSales column, while keeping it as a table. If you click a cell in the GroupedSales column, you can see a preview of what the added columns look like to verify everything worked as intended:

Previous of YTD calculation

We can now remove the Department and Year columns, as the tables in the GroupedSales column have all the columns we need.

Removing Department and Year columns

Now expand (icon just to the right of GroupedSales column name) the GroupedSales column to reveal the updated table with the YTD calculation added:

Expanding GroupedSales column

Click OK and you have the final result:

Year To Date final result

Rather than using a window function in SQL to calculate YTD Sales, we have successfully used Power Query to accomplish the same goal.

Part 2: Calculate Life-To-Date Sales by Department

Note: We won’t show as many screenshots for this section as most of the steps are similar to calculating YTD Sales.

Step 1: Group by Department, with All Rows as the operation. We don’t need a Year column because it’s not needed for LTD calculations.

Group by department

We now have a similar result to YTD, except now we’ve only grouped by Department, not Department and Year.

Grouped data by department

Now we are going to do basically the same thing we did in the YTD section. Create an LTD function that generates a RowNumber in ascending order of dates, and sums based on a range derived from that RowNumber.

We have accomplished the same thing we did for YTD. Generate a RowNumber column in ascending order by Date that will be used for List.Range. After running the function and expanding the GroupedSales column just like in YTD section, we have our final result for LTD:

Life To Date final result

Once again, rather than using a window function in SQL to calculate LTD Sales, we have successfully used Power Query to accomplish the same goal.

Part 3: Calculate Previous Date’s Sales by Department

Just like in the LTD example, we are going to group our initial data by Department:

Grouped data by department

Just like in previous sections, we are going to create a custom function that calculates the lag, or previous date’s sales. We are going to add a RowNumber column that will be used as index to get the previous date’s value. If you consider the Sales column sorted by ascending dates as a list, with the first row starting at index 0, then in order to get the previous date’s sales, the second row would need to grab the value for index 0 (first row), the third row would need to grab the value for index 1 (second row), and so on. Considering that, here is what our lag function would look like:

For the first part of the function (line 11) we are sorting the table by ascending Date, starting with 0, and incrementing by 1 for our LagRowNumber column.

For the second part of the function (line 12), we are getting the value of the Sales column for LagRowNumber minus 1 so that we can get the previous date’s value based on its index. The other key part to notice is that we are establishing this value as a list of length 1 (the last parameter in the List.Range function).

After running the CalculateLagFunction, we can see the preview of the data:

Preview of previous date’s sales

One thing to notice is that the first row is null. This is because there is no previous date for that row, so its previous date’s sales value doesn’t exist. The other thing to notice is that each value is a list, so we can’t actually see the values yet. That is fine, as we will expand it later. We also know there won’t be any problems doing that because we specified the length of each list to be 1, so we don’t have to worry about multiple values in a list.

When we expand the GroupedSales column, it looks like this:

Final previous date sales where data is still a list

In our YTD and LTD examples, we were done at this step. However, we have one additional step here which is to expand the PreviousDateSales column so that the values aren’t lists anymore. We accomplish this by selecting Expand > Expand to New Rows.

Expanding list data

Now, we have our final result:

Previous Date’s Sales Final results

Once again, rather than using a window function in SQL to calculate Previous Date’s Sales, we have successfully used Power Query to accomplish the same goal.

Part 4: Calculate Next Date’s Sales by Department

Almost all of the steps in this one are the same as calculation the Previous Date’s Sales by Department in Part 3, but with just a couple of tweaks to the function.

After grouping by department, same as in Part 3, here is what our CalculateLeadFunction looks like:

Because we are looking for the next date’s sales, we don’t need the previous row’s index number, but the next row’s index number. So two main things have changed in this CalculateLeadFunction from the CalculateLagFunction: 1) The index column starts at RowNumber 1 instead of 0 and 2) In our List.Range formula, we just take the index number as it is, without subtracting 1.

After expanding the GroupedSales column and expanding the lists as we did in Part 3, we have our final result:

Next Date’s Sales final results

Notice this time that the last date’s row for a Department is null, since we are looking for the next date’s row, instead of the previous date’s row.

Once again, rather than using a window function in SQL to calculate Next Date’s Sales, we have successfully used Power Query to accomplish the same goal.

Part 5: Calculate Total Sales by Department

This example will be slightly different, as we don’t have to sort the data by ascending Date. We just want a column that shows the total sales by department.

We will group by department only, similar to previous examples.

Grouped data by department

We will create a TotalSalesFunction, similar to previous examples, but the logic is more straightforward as we just want to sum all Sales for the grouping:

After running this, we can see the preview for what each grouping looks like:

Preview of Total Sales by department

After expanding the GroupedSales column, we now have our final result:

Final result of total sales

Once again, rather than using a window function in SQL to calculate Total Sales, we have successfully used Power Query to accomplish the same goal.

Part 6: Calculate Descending RowNumber to Generate a Flag Column

If you have read this far in the article, you probably made it farther than most.
One common usage for Window functions is to generate a RowNumber column. I’ve done this before when I want to know the latest record for a particular window/partition, so that I can create a True/False flag based on the row number. Let’s look at an example of how to do that.

Just as we’ve done in most examples in this article, we will group by Department:

Grouped data by department

I now want to generate a RowNumber column that will sort the data in descending Date order, so that the most recent date for a department will be RowNumber 1, the next recent date is 2, and so on. Here is what logic will look like:

The preview of the DescRowNumber looks like this, with the most recent date being 1:

Preview of row number

Expanding the GroupedSales column leaves us with a final result like this:

Final result with Row Number

However, to create our True/False flag to indicate if the row is the latest row for a department, we can create a conditional column:

Add conditional column option
IsLatestSalesRecord conditional column logic

Now we have our final result:

IsLatestSalesRecord final result

The DescRowNumber column can be removed, but I left it in there just for explanation purposes. Now we could filter IsLatestSalesRecord to True if we only wanted to see the latest Sales by Department.

Conclusion

Congratulations if you made it this far. I hope this helps you in working with Power Query and making all the data transformations that you need.

Thanks for reading!

--

--

Kyle Gibson
Kyle Gibson

Written by Kyle Gibson

Christian, husband, father. Data Engineer at Chick-fil-A.

Responses (1)