Handling Multiple Excel Files and Multiple Sheets in Power Query

Kyle Gibson
12 min readMar 9, 2023

--

Problem

Bringing in Excel files one at a time in Power Query is typically no problem. You do your transformations on that one file, and then if you need another file, you import that one in a separate query and do transformations to it. On a small scale, this works fine.

But what if you need to build a report off a folder containing 40 files? It would get tedious to create a new query for each one, especially if they all contain the same type of data. And if someone ever added a new file to that folder, you would have to update your Power Query editor to get the new file — it wouldn’t be dynamic at all.

But another problem arises if you try to fix that by connecting to a folder and expanding the data for all files instead of doing it individually. You sometimes end up with a mess like this:

Data is all over the place; column headers are repeated in row values; text values are in same columns with numeric values — this would take a while to clean up properly.

There’s a better and cleaner way to do this. That’s what this article is about.

Note: Yes, I know that it would be better to use a different place to store the data (like a database) than in 40 different Excel files. This article isn’t dealing with convincing users to migrate their data to a database, but rather how to deal with the situation if you’re required to work with the data in Excel format.

Scenario

Here is the scenario addressed in this article:

  • We are needing to build a report on sales
  • The source data is maintained by sales managers in Excel files
  • Each manager maintains an Excel file, and creates sheet/tab in that file for each sales person that reports to them.
  • Each manager has been given a template for what columns should be included and updated for each sales person: Date, Product, Quantity, Unit Price, and Total Sale.
  • They save their finished files into a shared folder that will be used for reporting. (Shared drive, SharePoint, etc.)
  • The amount of sales managers and salespeople reporting to a sales manager could change at any time.
  • We need to find a way to import a dynamic amount of files with a dynamic amount of sheets.
  • We also don’t know if the managers will add their own columns or sheets to do their own analysis that don’t have anything to do with our reporting.

Let’s look at a potential way to handle this. This solution should apply in either Power BI or Excel.

Solution

Step 1: Import the folder

Power BI gives at least two options when searching “folder” in connectors:

folder options in Power Query

Import the folder containing the sales data and you should see something like this:

All files from folder listed in Power Query editor view

Step 2: Remove unnecessary columns

Select the Content, Name, and Extension columns at the same time (using ctrl or shift), right-click, and select Remove Other Columns:

Removing columns not needed

Now the table should look like this:

Content, Name, and extension columns only ones left

Step 3: Filter to Excel files only

Note: There’s more than one way to do this.

Click the drop down for the Extension column, choose Text Filters > Equals:

Text filter option for filtering column

Enter (or choose from the drop-down) .xlsx for the filter value:

Filter rows window to filter to .xlsx files

Now the table should look like this:

Power Query editor now only showing excel files

Step 4: Add custom column to get Excel file contents

Select Add Column > Custom Column:

Custom Column option in Power Query editor

Give the column a name, and use this formula:

= Excel.Workbook([Content])
Custom column editor to get Excel Workbook content

The table should now look like this:

Power Query editor data with new column of Excel content

Step 5: Remove and rename columns

The Content and Extension columns aren’t needed anymore, so they can be removed:

Remove extension and content columns

There’s also a column named Name in the table of the custom column that was just added, so let’s rename the Name column here to File Name to avoid confusion.

Rename option for columns

The table should now look like this:

Name has been renamed to File Name

Step 6: Expand the custom column

Select the icon on the right side of the Excel File Content column:

Expand option highlighted on column name

Make sure that Expand is selected, all column names selected, and Use original column name as prefix is not selected:

Expand Options editor

Click OK.

The table should now look this this:

Expanded table that now shows sheets for each Excel file

Step 7: Filter the results

For this example, let’s assume we only want to bring in Sheet objects, and not tables or named ranges. Let’s also assume we don’t want to bring in hidden sheets either, where the sales manager hid the sheet for whatever reason.

We want to filter the Kind column to Sheet and the Hidden column to FALSE. This can be implemented in the same way as we filtered the file extension in Step 3.

Filter rows editor to filter Kind to sheet
Filter rows editor that filters Hidden to false

The table should now look like this:

Data with kind and hidden filters applied

Step 8: Remove and rename columns

The Item, Kind, and Hidden columns aren’t needed anymore, so they can be removed, similar to how columns were removed in Step 5.

Let’s also rename the Name column to Sheet Name for clarity.

The table should now look like this:

Data with File Name, Sheet Name, and Data columns remaining

Step 9: Create a list of columns needed from the Excel sheets

We don’t know what extra columns a sales manager may add to this sheet, but we know the columns that we need for our report.

Select Home > New Source > Blank Query:

Blank Query option

Right-click the new query, and select Advanced Editor:

Advanced Editor option for query

In the Advanced Editor, create a list of the column names needed for the report:

let
Source = {"Date", "Product", "Quantity", "Unit Price", "Total Sale"}
in
Source
Advanced editor showing list of column names

Click Done.

Right-click the new query, and give it a more descriptive name, like ExcelColumnNames.

Query rename option

The new list query should look like this:

Renamed query and shows list in editor

Step 10: Create a custom function to transform each sheet

Similar to step 9, select Home > New Source > Blank Query, and open the Advanced Editor for the new query.

In the Advanced Editor, use this function:

let
// Accept a table as argument and return a table as output
TableTransformations = (table_arg as table) as table =>
let
// Promote row 1 values as headers
PromoteHeaders = Table.PromoteHeaders(table_arg),
// Select the columns we need. If the sheet doesn't have a column name, it will show as null
SelectedColumns = Table.SelectColumns(PromoteHeaders, ExcelColumnNames, MissingField.UseNull)
in
SelectedColumns
in
TableTransformations
Table transformations function in advanced editor

This function accepts a table as its argument and returns a table as its output.

For each table passed to it, it promotes the values in the first row to column names, and then selects specific column names. Notice that the column names being selected in the Table.SelectColumns statement are the ones found in the ExcelColumnNames list just created in the previous step.

This is important because the sheet named Tyrone in the Bob.xlsx file has extra columns in it that we don’t want:

sample data from Tyrone worksheet with extra columns that we don’t want

The MissingField.UseNull option passed in the Table.SelectColumns statement means that if any table passed to it doesn’t have a column name being selected, it will return null for that column value instead of raising an error.

This is important because the sheet named Notes in the Sarah.xlsx file only has one column named Notes:

sample sheet showing notes column

Click Done.

Right-click your new query, select Rename, and name it something more descriptive like TableTransformations:

Renamed query for the TableTransformations function

Step 11: Add custom column to get transformed data

Select Add Column > Custom Column, give the column a name like Transformed Data, and enter the following:

= TableTransformations([Data])
Custom Column editor to apply TableTransformations function

Click OK.

Remove the Data column as it’s not needed anymore.

The table should now look like this:

Data with new transformated data column

Step 12: Expand the custom column of transformed data

Select the expand button on the Transformed Data column:

Expand option for transformed data column

Make sure that Expand is selected, all column names selected, and Use original column name as prefix is not selected:

Expand options for column

Click OK.

The table should now look like this:

Expanded data for transformed data column

That’s it! From here, you should be able to do your own analysis.

However, there’s a few other steps that we can show to make these transformations even better. Keep reading if you want to see them.

Bonus Section

Step 13: Use the list of column names instead of hard-coded column names for expanding data

In the step we just completed, if you look at the formula bar, you can see that our columns to expand are hard-coded.

Expanded data with hard-coded column names

This may be fine for a quick example, but for maintaining this long-term, that probably won’t be the best solution. If we start asking the sales managers to include additional columns in their files, we want to be able to update that ExcelColumnNames list, and then everything downstream updates to use the new column names.

So, in the formula bar, delete the lists of hard-coded column names, and instead put in the name of the query that contains the list of column names, like ExcelColumnNames:

= Table.ExpandTableColumn(#"Removed Columns Not Needed_4", "Transformed Data", ExcelColumnNames)
Expanded column with dynamic list of column names

My previous step was named Removed Columns Not Needed_4, which is why that’s in the formula.

Now our expanding of the transformed data is dynamic.

However, we still have a problem to solve.

As you can clearly see, the Notes sheet in the Sarah.xlsx doesn’t need to be in our data. It was used by the manager for their own notes, and doesn’t pertain to our reporting on sales data.

But how can we dynamically know which rows to filter out? That’s what we’ll figure out.

Step 14: Add custom column to get count of non-null columns

Select Add Column > Custom Column, give it a name like Count of Non Null Columns and enter the following:

= List.NonNullCount(Record.ToList(_))
Custom Column editor to get count of non null columns

Click OK.

The custom column should look like this:

Count of Non Null Columns column

As you can see, the last row only has 2 non-null columns which we know are File Name and Sheet Name. Every other column value is null, meaning it has no sales data.

We need a way to dynamically filter these rows out.

Step 15: Filter out rows where all columns needed are null

Open Advanced Editor for your main query. It should look something like this, although your steps may be named differently:

let
Source = Folder.Files("\\MyComputer\DataMadness\Desktop\Power BI Multiple Excel Files\Manager Sales Data"),
#"Removed Columns Not Needed" = Table.SelectColumns(Source,{"Content", "Name", "Extension"}),
#"Filter to Excel Files Only" = Table.SelectRows(#"Removed Columns Not Needed", each [Extension] = ".xlsx"),
#"Added Excel Workbook Column" = Table.AddColumn(#"Filter to Excel Files Only", "Excel File Content", each Excel.Workbook([Content])),
#"Removed Columns Not Needed_2" = Table.RemoveColumns(#"Added Excel Workbook Column",{"Content", "Extension"}),
#"Renamed to File Name" = Table.RenameColumns(#"Removed Columns Not Needed_2",{{"Name", "File Name"}}),
#"Expanded Excel File Content" = Table.ExpandTableColumn(#"Renamed to File Name", "Excel File Content", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name", "Data", "Item", "Kind", "Hidden"}),
#"Filtered to Sheets Only" = Table.SelectRows(#"Expanded Excel File Content", each [Kind] = "Sheet"),
#"Filtered to Sheets Not Hidden" = Table.SelectRows(#"Filtered to Sheets Only", each [Hidden] = false),
#"Removed Columns Not Needed_3" = Table.RemoveColumns(#"Filtered to Sheets Not Hidden",{"Item", "Kind", "Hidden"}),
#"Renamed to Sheet Name" = Table.RenameColumns(#"Removed Columns Not Needed_3",{{"Name", "Sheet Name"}}),
#"Added Transformed Data Column" = Table.AddColumn(#"Renamed to Sheet Name", "Transformed Data", each TableTransformations([Data])),
#"Removed Columns Not Needed_4" = Table.RemoveColumns(#"Added Transformed Data Column",{"Data"}),
#"Expanded Transformed Data" = Table.ExpandTableColumn(#"Removed Columns Not Needed_4", "Transformed Data", ExcelColumnNames),
#"Added Non Null Count Column" = Table.AddColumn(#"Expanded Transformed Data", "Count of Non Null Columns", each List.NonNullCount(Record.ToList(_)))
in
#"Added Non Null Count Column"

Now we need a way to dynamically filter the Count of Non Null Columns to get rid of rows we don’t need.

We need:

  • The count of column names in our ExcelColumnsList
  • The count of column names in the table of the step prior to adding the Count of Non Null Columns column
  • Take the difference between the two
  • If the count of null columns is greater than that difference, then we keep that row. Otherwise, we don’t need it.

Add another step to your Advanced Editor that looks like this:

#"Filter out null rows" = Table.SelectRows(#"Added Non Null Count Column", each [Count of Non Null Columns] > (Table.ColumnCount(#"Expanded Transformed Data") - List.Count(ExcelColumnNames)))

The full query will look like this:

let
Source = Folder.Files("\\MyComputer\DataMadness\Desktop\Power BI Multiple Excel Files\Manager Sales Data"),
#"Removed Columns Not Needed" = Table.SelectColumns(Source,{"Content", "Name", "Extension"}),
#"Filter to Excel Files Only" = Table.SelectRows(#"Removed Columns Not Needed", each [Extension] = ".xlsx"),
#"Added Excel Workbook Column" = Table.AddColumn(#"Filter to Excel Files Only", "Excel File Content", each Excel.Workbook([Content])),
#"Removed Columns Not Needed_2" = Table.RemoveColumns(#"Added Excel Workbook Column",{"Content", "Extension"}),
#"Renamed to File Name" = Table.RenameColumns(#"Removed Columns Not Needed_2",{{"Name", "File Name"}}),
#"Expanded Excel File Content" = Table.ExpandTableColumn(#"Renamed to File Name", "Excel File Content", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name", "Data", "Item", "Kind", "Hidden"}),
#"Filtered to Sheets Only" = Table.SelectRows(#"Expanded Excel File Content", each [Kind] = "Sheet"),
#"Filtered to Sheets Not Hidden" = Table.SelectRows(#"Filtered to Sheets Only", each [Hidden] = false),
#"Removed Columns Not Needed_3" = Table.RemoveColumns(#"Filtered to Sheets Not Hidden",{"Item", "Kind", "Hidden"}),
#"Renamed to Sheet Name" = Table.RenameColumns(#"Removed Columns Not Needed_3",{{"Name", "Sheet Name"}}),
#"Added Transformed Data Column" = Table.AddColumn(#"Renamed to Sheet Name", "Transformed Data", each TableTransformations([Data])),
#"Removed Columns Not Needed_4" = Table.RemoveColumns(#"Added Transformed Data Column",{"Data"}),
#"Expanded Transformed Data" = Table.ExpandTableColumn(#"Removed Columns Not Needed_4", "Transformed Data", ExcelColumnNames),
#"Added Non Null Count Column" = Table.AddColumn(#"Expanded Transformed Data", "Count of Non Null Columns", each List.NonNullCount(Record.ToList(_))),
#"Filter out null rows" = Table.SelectRows(#"Added Non Null Count Column", each [Count of Non Null Columns] > (Table.ColumnCount(#"Expanded Transformed Data") - List.Count(ExcelColumnNames)))
in
#"Filter out null rows"

As you can see, we used the Table.ColumnCount and List.Count functions to get the counts we needed. We took the difference, and filtered our Count of Non Null Columns column with that difference.

The table should now look like this:

Filtered table without null rows

It filtered out the row for the Sheet Name of Notes since it didn’t contain any sales data.

Step 16: Remove the custom column used to get non null counts

Right-click the Count of Non Null Columns column, and select Remove.

The final data should look like this:

Table without the non null count column

That’s it!

This article was a little lengthy, but hopefully showed some ways that you could be dynamic in working with multiple Excel files containing multiple sheets.

Thanks for reading!

--

--

Kyle Gibson
Kyle Gibson

Written by Kyle Gibson

Christian, husband, father. Data Engineer at Chick-fil-A.

No responses yet