Handling Multiple Excel Files and Multiple Sheets in Power Query
Problem
Bringing in Excel files one at a time in Power Query is typically no problem. You do your transformations on that one file, and then if you need another file, you import that one in a separate query and do transformations to it. On a small scale, this works fine.
But what if you need to build a report off a folder containing 40 files? It would get tedious to create a new query for each one, especially if they all contain the same type of data. And if someone ever added a new file to that folder, you would have to update your Power Query editor to get the new file — it wouldn’t be dynamic at all.
But another problem arises if you try to fix that by connecting to a folder and expanding the data for all files instead of doing it individually. You sometimes end up with a mess like this:
Data is all over the place; column headers are repeated in row values; text values are in same columns with numeric values — this would take a while to clean up properly.
There’s a better and cleaner way to do this. That’s what this article is about.
Note: Yes, I know that it would be better to use a different place to store the data (like a database) than in 40 different Excel files. This article isn’t dealing with convincing users to migrate their data to a database, but rather how to deal with the situation if you’re required to work with the data in Excel format.
Scenario
Here is the scenario addressed in this article:
- We are needing to build a report on sales
- The source data is maintained by sales managers in Excel files
- Each manager maintains an Excel file, and creates sheet/tab in that file for each sales person that reports to them.
- Each manager has been given a template for what columns should be included and updated for each sales person: Date, Product, Quantity, Unit Price, and Total Sale.
- They save their finished files into a shared folder that will be used for reporting. (Shared drive, SharePoint, etc.)
- The amount of sales managers and salespeople reporting to a sales manager could change at any time.
- We need to find a way to import a dynamic amount of files with a dynamic amount of sheets.
- We also don’t know if the managers will add their own columns or sheets to do their own analysis that don’t have anything to do with our reporting.
Let’s look at a potential way to handle this. This solution should apply in either Power BI or Excel.
Solution
Step 1: Import the folder
Power BI gives at least two options when searching “folder” in connectors:
Import the folder containing the sales data and you should see something like this:
Step 2: Remove unnecessary columns
Select the Content, Name, and Extension columns at the same time (using ctrl or shift), right-click, and select Remove Other Columns:
Now the table should look like this:
Step 3: Filter to Excel files only
Note: There’s more than one way to do this.
Click the drop down for the Extension column, choose Text Filters > Equals:
Enter (or choose from the drop-down) .xlsx for the filter value:
Now the table should look like this:
Step 4: Add custom column to get Excel file contents
Select Add Column > Custom Column:
Give the column a name, and use this formula:
= Excel.Workbook([Content])
The table should now look like this:
Step 5: Remove and rename columns
The Content and Extension columns aren’t needed anymore, so they can be removed:
There’s also a column named Name in the table of the custom column that was just added, so let’s rename the Name column here to File Name to avoid confusion.
The table should now look like this:
Step 6: Expand the custom column
Select the icon on the right side of the Excel File Content column:
Make sure that Expand is selected, all column names selected, and Use original column name as prefix is not selected:
Click OK.
The table should now look this this:
Step 7: Filter the results
For this example, let’s assume we only want to bring in Sheet objects, and not tables or named ranges. Let’s also assume we don’t want to bring in hidden sheets either, where the sales manager hid the sheet for whatever reason.
We want to filter the Kind column to Sheet and the Hidden column to FALSE. This can be implemented in the same way as we filtered the file extension in Step 3.
The table should now look like this:
Step 8: Remove and rename columns
The Item, Kind, and Hidden columns aren’t needed anymore, so they can be removed, similar to how columns were removed in Step 5.
Let’s also rename the Name column to Sheet Name for clarity.
The table should now look like this:
Step 9: Create a list of columns needed from the Excel sheets
We don’t know what extra columns a sales manager may add to this sheet, but we know the columns that we need for our report.
Select Home > New Source > Blank Query:
Right-click the new query, and select Advanced Editor:
In the Advanced Editor, create a list of the column names needed for the report:
let
Source = {"Date", "Product", "Quantity", "Unit Price", "Total Sale"}
in
Source
Click Done.
Right-click the new query, and give it a more descriptive name, like ExcelColumnNames.
The new list query should look like this:
Step 10: Create a custom function to transform each sheet
Similar to step 9, select Home > New Source > Blank Query, and open the Advanced Editor for the new query.
In the Advanced Editor, use this function:
let
// Accept a table as argument and return a table as output
TableTransformations = (table_arg as table) as table =>
let
// Promote row 1 values as headers
PromoteHeaders = Table.PromoteHeaders(table_arg),
// Select the columns we need. If the sheet doesn't have a column name, it will show as null
SelectedColumns = Table.SelectColumns(PromoteHeaders, ExcelColumnNames, MissingField.UseNull)
in
SelectedColumns
in
TableTransformations
This function accepts a table as its argument and returns a table as its output.
For each table passed to it, it promotes the values in the first row to column names, and then selects specific column names. Notice that the column names being selected in the Table.SelectColumns statement are the ones found in the ExcelColumnNames list just created in the previous step.
This is important because the sheet named Tyrone in the Bob.xlsx file has extra columns in it that we don’t want:
The MissingField.UseNull option passed in the Table.SelectColumns statement means that if any table passed to it doesn’t have a column name being selected, it will return null for that column value instead of raising an error.
This is important because the sheet named Notes in the Sarah.xlsx file only has one column named Notes:
Click Done.
Right-click your new query, select Rename, and name it something more descriptive like TableTransformations:
Step 11: Add custom column to get transformed data
Select Add Column > Custom Column, give the column a name like Transformed Data, and enter the following:
= TableTransformations([Data])
Click OK.
Remove the Data column as it’s not needed anymore.
The table should now look like this:
Step 12: Expand the custom column of transformed data
Select the expand button on the Transformed Data column:
Make sure that Expand is selected, all column names selected, and Use original column name as prefix is not selected:
Click OK.
The table should now look like this:
That’s it! From here, you should be able to do your own analysis.
However, there’s a few other steps that we can show to make these transformations even better. Keep reading if you want to see them.
Bonus Section
Step 13: Use the list of column names instead of hard-coded column names for expanding data
In the step we just completed, if you look at the formula bar, you can see that our columns to expand are hard-coded.
This may be fine for a quick example, but for maintaining this long-term, that probably won’t be the best solution. If we start asking the sales managers to include additional columns in their files, we want to be able to update that ExcelColumnNames list, and then everything downstream updates to use the new column names.
So, in the formula bar, delete the lists of hard-coded column names, and instead put in the name of the query that contains the list of column names, like ExcelColumnNames:
= Table.ExpandTableColumn(#"Removed Columns Not Needed_4", "Transformed Data", ExcelColumnNames)
My previous step was named Removed Columns Not Needed_4, which is why that’s in the formula.
Now our expanding of the transformed data is dynamic.
However, we still have a problem to solve.
As you can clearly see, the Notes sheet in the Sarah.xlsx doesn’t need to be in our data. It was used by the manager for their own notes, and doesn’t pertain to our reporting on sales data.
But how can we dynamically know which rows to filter out? That’s what we’ll figure out.
Step 14: Add custom column to get count of non-null columns
Select Add Column > Custom Column, give it a name like Count of Non Null Columns and enter the following:
= List.NonNullCount(Record.ToList(_))
Click OK.
The custom column should look like this:
As you can see, the last row only has 2 non-null columns which we know are File Name and Sheet Name. Every other column value is null, meaning it has no sales data.
We need a way to dynamically filter these rows out.
Step 15: Filter out rows where all columns needed are null
Open Advanced Editor for your main query. It should look something like this, although your steps may be named differently:
let
Source = Folder.Files("\\MyComputer\DataMadness\Desktop\Power BI Multiple Excel Files\Manager Sales Data"),
#"Removed Columns Not Needed" = Table.SelectColumns(Source,{"Content", "Name", "Extension"}),
#"Filter to Excel Files Only" = Table.SelectRows(#"Removed Columns Not Needed", each [Extension] = ".xlsx"),
#"Added Excel Workbook Column" = Table.AddColumn(#"Filter to Excel Files Only", "Excel File Content", each Excel.Workbook([Content])),
#"Removed Columns Not Needed_2" = Table.RemoveColumns(#"Added Excel Workbook Column",{"Content", "Extension"}),
#"Renamed to File Name" = Table.RenameColumns(#"Removed Columns Not Needed_2",{{"Name", "File Name"}}),
#"Expanded Excel File Content" = Table.ExpandTableColumn(#"Renamed to File Name", "Excel File Content", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name", "Data", "Item", "Kind", "Hidden"}),
#"Filtered to Sheets Only" = Table.SelectRows(#"Expanded Excel File Content", each [Kind] = "Sheet"),
#"Filtered to Sheets Not Hidden" = Table.SelectRows(#"Filtered to Sheets Only", each [Hidden] = false),
#"Removed Columns Not Needed_3" = Table.RemoveColumns(#"Filtered to Sheets Not Hidden",{"Item", "Kind", "Hidden"}),
#"Renamed to Sheet Name" = Table.RenameColumns(#"Removed Columns Not Needed_3",{{"Name", "Sheet Name"}}),
#"Added Transformed Data Column" = Table.AddColumn(#"Renamed to Sheet Name", "Transformed Data", each TableTransformations([Data])),
#"Removed Columns Not Needed_4" = Table.RemoveColumns(#"Added Transformed Data Column",{"Data"}),
#"Expanded Transformed Data" = Table.ExpandTableColumn(#"Removed Columns Not Needed_4", "Transformed Data", ExcelColumnNames),
#"Added Non Null Count Column" = Table.AddColumn(#"Expanded Transformed Data", "Count of Non Null Columns", each List.NonNullCount(Record.ToList(_)))
in
#"Added Non Null Count Column"
Now we need a way to dynamically filter the Count of Non Null Columns to get rid of rows we don’t need.
We need:
- The count of column names in our ExcelColumnsList
- The count of column names in the table of the step prior to adding the Count of Non Null Columns column
- Take the difference between the two
- If the count of null columns is greater than that difference, then we keep that row. Otherwise, we don’t need it.
Add another step to your Advanced Editor that looks like this:
#"Filter out null rows" = Table.SelectRows(#"Added Non Null Count Column", each [Count of Non Null Columns] > (Table.ColumnCount(#"Expanded Transformed Data") - List.Count(ExcelColumnNames)))
The full query will look like this:
let
Source = Folder.Files("\\MyComputer\DataMadness\Desktop\Power BI Multiple Excel Files\Manager Sales Data"),
#"Removed Columns Not Needed" = Table.SelectColumns(Source,{"Content", "Name", "Extension"}),
#"Filter to Excel Files Only" = Table.SelectRows(#"Removed Columns Not Needed", each [Extension] = ".xlsx"),
#"Added Excel Workbook Column" = Table.AddColumn(#"Filter to Excel Files Only", "Excel File Content", each Excel.Workbook([Content])),
#"Removed Columns Not Needed_2" = Table.RemoveColumns(#"Added Excel Workbook Column",{"Content", "Extension"}),
#"Renamed to File Name" = Table.RenameColumns(#"Removed Columns Not Needed_2",{{"Name", "File Name"}}),
#"Expanded Excel File Content" = Table.ExpandTableColumn(#"Renamed to File Name", "Excel File Content", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name", "Data", "Item", "Kind", "Hidden"}),
#"Filtered to Sheets Only" = Table.SelectRows(#"Expanded Excel File Content", each [Kind] = "Sheet"),
#"Filtered to Sheets Not Hidden" = Table.SelectRows(#"Filtered to Sheets Only", each [Hidden] = false),
#"Removed Columns Not Needed_3" = Table.RemoveColumns(#"Filtered to Sheets Not Hidden",{"Item", "Kind", "Hidden"}),
#"Renamed to Sheet Name" = Table.RenameColumns(#"Removed Columns Not Needed_3",{{"Name", "Sheet Name"}}),
#"Added Transformed Data Column" = Table.AddColumn(#"Renamed to Sheet Name", "Transformed Data", each TableTransformations([Data])),
#"Removed Columns Not Needed_4" = Table.RemoveColumns(#"Added Transformed Data Column",{"Data"}),
#"Expanded Transformed Data" = Table.ExpandTableColumn(#"Removed Columns Not Needed_4", "Transformed Data", ExcelColumnNames),
#"Added Non Null Count Column" = Table.AddColumn(#"Expanded Transformed Data", "Count of Non Null Columns", each List.NonNullCount(Record.ToList(_))),
#"Filter out null rows" = Table.SelectRows(#"Added Non Null Count Column", each [Count of Non Null Columns] > (Table.ColumnCount(#"Expanded Transformed Data") - List.Count(ExcelColumnNames)))
in
#"Filter out null rows"
As you can see, we used the Table.ColumnCount and List.Count functions to get the counts we needed. We took the difference, and filtered our Count of Non Null Columns column with that difference.
The table should now look like this:
It filtered out the row for the Sheet Name of Notes since it didn’t contain any sales data.
Step 16: Remove the custom column used to get non null counts
Right-click the Count of Non Null Columns column, and select Remove.
The final data should look like this:
That’s it!
This article was a little lengthy, but hopefully showed some ways that you could be dynamic in working with multiple Excel files containing multiple sheets.
Thanks for reading!