How to Write Functions in Power Query
Introduction
I have struggled to understand the syntax of functions in Power Query. Now part of the reason for this struggle is that I struggled to understand the M Language altogether, not just functions. I feel like I have a pretty decent grasp on SQL, Python, and Spark, but not much about M Language made a whole lot of sense to me.
Now that I’ve spent some time working on learning this language, this article will attempt to simply explain how to write a function in Power Query, so that we can reuse logic and stick to the DRY principle of writing code (Don’t Repeat Yourself).
Think of a basic function and how it works. I’ll show a Python function that adds two numbers together:
def add_two_numbers(first_number, second_number):
return first_number + second_numberanswer = add_two_numbers(2,4)print(answer)6
This is a very basic function. We have a function named add_two_numbers that accepts two parameters: the first number to add (first_number), and the second number to add (second_number). The function returns the value of the two numbers added together. Therefore, when we pass in 2 and 4, the result (answer) we get is 6.
Let’s see how we would do something similar in Power Query.
Examples
Example 1: Create Simple Function to Add Two Numbers
Step 1: Navigate to Power Query Editor. Select Home > New Source > Blank Query to open a new query:
Step 2: Rename our query to something that makes sense for our function’s purpose, like AddTwoNumbers, and open Advanced Editor:
As we can see, there is nothing in our query yet, other than the default format for let and in, the standard Power Query syntax. It defaults to populating a variable named Source, which would be standard if we were getting data from somewhere. However, we are creating this query to be a function, not one that gets data, so we will change that.
Step 3: Write our function
- Just like in our Python example, we want to accept two parameters (the numbers we are adding together) and return the result.
- Let’s see what the logic would look like:
Some points to make here:
- Inside of the let expression, we are declaring the logic for our function — the parameters (first_number and second_number), data types (number), and output (first_number + second_number).
- Setting the function logic equal to Add_Two_Numbers allows us to return Add_Two_Numbers after the in expression. This means that our query returns whatever has been assigned to Add_Two_Numbers, which is our function output.
- Parameters go on the left side of the “=>” operator (inside of parentheses), and the logic for the function output goes on the right side of the “=>” operator (not inside of parentheses).
- Declaring the data types for the parameters forces them to be required parameters unless we add the add the optional keyword before them. For example, if we want the first parameter to be required, and the second parameter optional, we could write it like this:
- The other way to make a parameter optional is to not declare a data type for it. I tend to think it’s always better to declare data types to ensure your function works as intended, but it is an option. It would look like this:
- Another best practice is to assign a data type to the output of our function. This will be on the left side of the “=>” operator, but outside of the parentheses (unlike the parameters). In our example, we are declaring our output to be of number data type:
- Now for output, or what the function will return, it goes on the right side of the “=>” operator. If the logic is very simple and won’t require multiple steps, we can just put in there directly, like in this example:
- We just took our two required parameters added them together. Easy enough to understand.
- But what if we had made that second_number parameter optional, like we showed above? How can we add two numbers together if one of them doesn’t get passed to the function? Now we have to add some conditional logic to the function:
- Now we have added an if > then > else statement, where if no value is passed for second_number, meaning it’s null, then we return the value of the first_number. Otherwise, we add the two numbers together as it is supposed to do. If we don’t add this validation, we could get errors or bad results for our function. It is important to recognize that making a parameter optional means if no value gets passed to it, it’s value will be null.
- Finally, we need to return this function after the in expression. Since we have set our function equal to a variable named Add_Two_Numbers, we return Add_Two_Numbers after our in expression, so that our query will return the output of the function.
- Now that our function is written how we like, click Done in Advanced Editor to save the query:
Step 4: Test our function
- Select the query on the left, and we see there’s an option to enter values for our function’s parameters:
- Enter values for the parameters (I’ll put in 4 and 6):
- Now click Invoke to run the function with the parameters that you’ve entered:
- As you can see, Power Query created a new query named Invoked Function that used the AddTwoNumbers function, and returned a result of 10. This is what we expected by passing in 4 and 6 as the parameter values.
- If we pull up Advanced Editor for the new Invoked Function query, we can see how Power Query automatically generated the statement to use the newly created function as Source:
- We don’t need that Invoked Function query anymore, as it was just being used for testing, so let’s delete that query.
Step 5: Use our function to add a new column to a table
- In the previous step, we tested whether or not our function would work. However, just entering parameters manually and getting a single value back doesn’t really help us perform our job duties. We need a way to use a function where we are actually transforming data. Let’s see how we can use our function in a real-life scenario.
- Let’s start with some sample Sales data, where we have an Amount column and a Tax column, and we need to add them together to get a Total column.
- Select the Sales query on the left, then choose Add Column > Custom Column to add a new column:
- With the Custom Column window now open, we can call our AddTwoNumbers function, and pass in the columns as arguments, instead of just individual values like when we tested the function. It will look something like this:
- Click OK to add the new Total column.
- Now, looking at the Sales data, we notice there is a new column named Total that adds the Amount and Tax columns together:
- Our function is a lot more powerful than just returning a single value adding two numbers together — it can operate on each row of a column to generate a new column with a value for each row.
- If you are wondering how that works, you can pull up Advanced Editor for the Sales query to see what was added when we created the new custom column. You’ll notice that there is a keyword added — each. This is added to indicate we are applying this function to each row, not just singular values like when we tested the function
- That’s it. We’ve successfully created a simple function, and used it to transform our data to assist us in our analysis and reporting.
Example 2: Create More Complex Function to Add Row Number
Step 1: Write our function (We are skipping Steps 1 and 2 from Example 1, since we already know how to do them now).
- In our example, assume we have a Dates table that looks like this…
…and we want to add a RowNumber field to it, started with the earliest Date, and working our way up in ascending order. Essentially we are adding an ID or Index field to the table, which could be considered a DateID field.
- Here is how you could write a function to do this:
Some points to consider:
- This function has four total parameters — two required (table_arg and column_name_to_sort) and two optional (is_sorted_descending and row_number_column_name). Normally I would say that a function (in any language) with four parameters is probably too many, but I’m using them in this example to demonstrate how they can be used.
- We are using two new data types in this example that we didn’t in the previous — table and logical. The table data type is self-explanatory in that we are accepting an actual table as an argument and logical data type is equivalent to boolean, meaning True/False values. You’ll also notice on line 2 that we are declaring the data type of the function output to be table as well.
- Since we have two parameters that are optional, we need to make sure that we handle situations where no value is passed for them.
- Notice in this function that the output (the part to the right of the “=>” operator) now has its own let/in expression. To make the code more readable, I actually put this let/in expression on the next lines instead of all to the right of the “=>” operator. But the let/in is following that operator — whether it’s to the right or underneath is just a matter of preference. The main takeaway — if your function requires multiple steps, as this one does, you will need to have a let/in expression for your output, nested inside of the main let expression for the query.
- Here is the general idea of how this function works:
- Accept a table parameter(table_arg) that will have a row number added to it, and accept a column name parameter (column_name_to_sort) that will be used to sort on.
- There is an optional parameter named is_sorted_descending where we can pass True or False to indicate how we want the column_name_to_sort sorted. If we don’t pass anything for this parameter, we will default to sorting in ascending order (see Lines 6 and 7 in the code above).
- There is an optional parameter named row_number_column_name where we can pass what we want our new row number column to be named. If we don’t pass anything for this parameter, it will default to being RowNumber (see Line 4 in the code above).
- Considering all of that logic, the Add_Index step handles each scenario, whether it’s sorting ascending or descending, and using the Row_Number_Name step to apply the new column name.
- Note that our function output let/in returns the Add_Index step after the in expression (see Lines 10 and 11 in the code above). This ensures that our function output is returning the value from the last step of our let expression.
- But we still have to return our function step Add_Row_Number after the main in expression (see Lines 2, 12, and 13 in the code above).
- After writing our function, click Done in Advanced Editor to save the function.
Step 2: Test our function
- Select the AddRowNumber function, and enter values for the parameters. Notice that because we declared the table_arg as table data type, it provides a drop-down to choose from available tables to test on.
- Click Invoke, and look at the results:
- Just as in Example 1, Power Query creates a new query called Invoked Function, but now our output is a table (just as we defined in our function), and you can see that DateID was added. You’ll also notice that the DateID values are in ascending order by Date, just as we specified.
- Delete the Invoked Function query since we don’t need it anymore, and let’s use this function in a more traditional way.
Step 3: Use our function to add a new column to a table
- Select the table that needs a row number column added, and open advanced editor:
- The code for our sample Dates data looks like this. There are two columns: Date and Day Name. The last step in the query is named #“Inserted Day Name” so that is what is returned for the in expression.
- We will add a new step after line 4 that calls our function, and then change the name of the step for the in expression to be the step we just added. It will look something like this:
- Notice that we added a new step on line 5 named #“Add Row Number”
- Inside of that step, we called our custom AddRowNumber function.
- We passed parameters to it just like we did when testing our function:
- table_arg parameter: the name of the previous step that returned a table
- column_name_to_sort parameter: “Date”, meaning we will sort by Date column
- is_sorted_descending parameter: false, because we want to sort in ascending order. We could’ve also passed null here, since the default behavior of our function is to sort ascending.
- row_number_column_name parameter: “DateID” since we want our new column to be named DateID.
- Lastly, make sure that we change the step for our in expression to be #“Add Row Number”, the new step we added.
- After doing all of that, click OK in Advanced Editor to save the updates.
- We now see the new DateID column:
- That’s it!
Conclusion
You’ve now seen two examples, one simple and one a little more complex, of how to write functions in Power Query. I hope this helps you and allows you to create reusable code for all your datasets and models.
Thanks for reading!