Combine or Append Data from Files

Combine, Consolidate or Append any or multiple txt, csv or Excel File

The Scenario

If you ever need to do any of the following:

  • Append Data
  • Consolidate Data
  • Combine Data

that comes from either an Excel, text or csv file then this is the best solution for you to do so.
In this pattern you’ll get the most optimal and easiest way to combine your files from an specific folder and combine them all together if you’d like. That’s right! combine data from a TXT, CSV and Excel all together. No limits on the size of the file or how many files you’d like to combine – it’ll simply work!

Download the Workbook

Our Goal

AppendGoalWhat we need is a way to extract all the data from our files like:

  • CSV Files
  • Text Files
  • Excel Files

and then somehow consolidate or append data in one tall table. This was a rather complex scenario that we could solve with VBA or SQL, but now we have a more efficient and user friendly way of doing this. Don’t forget to download the workbook in order to follow along!

Step 1: Unzip the downloaded File and explore!

Before we begin with the real pattern, make sure that you download the workbook in order to follow along. The workbook comes with the files that you can see in the previous image inside a folder called PQExample which is basically the folder where we’re going to point our Power Query solution to work on. The file that contains the actual Power Query solution is called Ultimate Combination.xlsx and you should be able to see it immediately after you unzip the downloaded file.
We encourage you to explore all files and see what’s in it. Overall, they all share the same column header names:

  • Product
  • Date
  • Gross Sales
  • Amount

Now we can head over to the real Pattern.

Step 2: Find the Query that does the Magic

Once you open the Ultimate Combination.xlsx file, you’ll notice that it has no data. The Power Query solution has been stored as a connection only and its awaiting your command to load its data to your Excel workbook. In order to view this query, you’ll need to go to the Power Query ribbon, click on the Show Pane icon and then on the right side you’ll see the Query Pane with a query called Ultimate Combination. Right click that query and then click on Edit to open the Query Editor Window and analyze the solution.

QueryError
You’ll immediately notice that there is an error with the query, but don’t be alarmed. The reason behind such error its because the solution is pointing at my (Miguel) local folder instead of yours. In order to fix that you’ll need to head over to the first step called Source and click on the gear icon right next to it. That should pop up a new window with a user friendly folder browser. Go ahead and find the folder you just unzipped called PQExample.

FromFolderAnd once you do that, you’ll notice that the solution will start to run and do its magic.

You could end it here and call it a day since all the files were combined already (click on the step called Expanded to see the final result), but instead, we ‘re going to show how simple creating a solution like this was.

Understanding the Query

Instead of writing a long paragraph, we’ve divided this into sections so you can understand each step of the query on its own. You can clic on the toggles to expand and view the content and collapse it back if you want.

Source

The way that we start the query is by selecting the folder where all my files are stored. This is easily done through the Power Query Ribbon as shown in the next picture:

FromFolderPQone thing to take in consideration is that Power Query will also grab the files from any subfolder, but you can filter those out by using the Path field.

Get the files

In this step we simply filter the file extension file so we only get the following extensions:

  • extension equals .csv
  • extension equals .txt
  • extension begins with .xls

The way that you’d do this is by selection the filter icon from the field and simply do a filter like you’d normally do in Excel.FilterField

Removed Other Columns

In this case, we’ll be removing some columns that we don’t need, but instead of selecting the columns that we don’t want we’ll be selecting the ones that we want and tell Power Query that we just want to keep those. Take a look at the following picture to find the button that does the trick but make sure that you first select the columns that you want to keep.

RemoveOtherColumns

Trans1

Here’s where we’ll need to get to know a bit about Power Query functions. We know that we have some Excel files in our query but, how do we extract the data from them?

Take a look at the columns that we have available. You’ll notice that we have a column called Content that holds a binary. That binary is the actual Excel file and in order to interpret that binary we need a function called Excel.Workbook(). Using the following formula:
if Text.StartsWith([Extension], ".xls") then Excel.Workbook( [Content]) else null

ExtractWorkbook

we get a new column that is basically the one that shows me all the data that the Excel file (on each row) holds.

You can click on any of the Table values found on the Custom column to find out what’s inside of each of them.

More often, we deal with 3 different kinds of Data inside an Excel Workbook:

  1. A Sheet
  2. A Table
  3. A Named Range

Note: You’ll notice that in some files we’ll have tables, sheets and in others we’ll only have sheets. This is the moment where we define if we just want to combine the tables, the sheets, the named ranges or a combination of them. Be sure to check that you’re not combining the same data twice as a table is part of a sheet and could potentially get combined in the wrong way.

Trans2

Now we need to expand that Custom column so we can get all the Excel data in our Query and choose the ones that we want.

ExpandExcelthe result of that gives us 4 new columns:

  • Custom.Data = the actual data found inside the Excel workbook (represented as a table)
  • Custom.Name = the name of the worksheet where the data is stored
  • Custom.Item = this is the name of the item, if it’s a named Range then it’ll be the name of that range, for Sheets then it’ll be the name of the Sheet and for Tables it’ll be the name of the table
  • Custom.Kind = it’s the name of the object that was found from the excel workbook. Most commons are Sheet, Table and DefinedName

The Formula and other steps

So far we managed to extract the data from the Excel workbook into objects. Now we need to find a way to extract the data from the CSV and TXT file and being able to combine that with the data found in the Excel file.

We’ll create a new column with this formula that will do the trick:

if [Extension] = ".csv" then
Table.PromoteHeaders(Csv.Document([Content]))
else if [Extension] = ".txt"
then Table.PromoteHeaders(Csv.Document([Content],null,"," ))
else if [Custom.Kind] <> "Table"
then Table.PromoteHeaders([Custom.Data])
else
[Custom.Data]

Note: we are assuming that the txt file is delimited by a comma, but you can change that by changing the comma in this line of code for something else like a pipe (|), bars (/\), semicolon or other.
Csv.Document([Content],null,"," ))

The result of that formula gives us a column that holds only tables with the correct headers. You might be tempted to expand this column right now, but we won’t do that just yet. Instead, we’ll clean the table that we currently have and delete unnecessary columns like Content and Custom.Data since all the data that I need is stored on the Custom column. We will also add a new column that will give us some important information, total row count, about the tables that we’re about to combine. Using this formula we get the total row count for those tables:
Table.RowCount([Custom])
and this is what you see at the Almost there step.

Almost there

Dynamically get a List of all headers in FIles

Fundamentally, that’s what the formula at the MyList step does. It basically grabs all the headers from all the tables found in the previous step, gathers them in a table and then creates a table of distinct values. It later transform that Table into a List as we’ll need that List later. That list will become a parameter so we can dynamically combine all the files regardless if they have all the same structure or not.

A common use case to do something like this is that perhaps we have some columns in some files but that are not present in others, but those columns might be needed for that specific analysis and this MyList pattern does just that. It dynamically creates a unique list of all the headers found thorough all the tables in the previous step.

Here’s the code:
=Table.ToList(Table.RemoveColumns(Table.Distinct(Table.ExpandListColumn(Table.AddColumn(#"Almost there", "ColumnHeaders", each Table.ColumnNames([Custom])), "ColumnHeaders"),{"ColumnHeaders"}),{"Name", "Extension", "Custom.Name", "Custom.Item", "Custom.Kind", "Custom", "Custom.1"}))
in order to insert this code you’ll need to create a custom step. You can create a custom step by clicking on the fx icon in the formula bar. Once you activate that custom step, you’ll have to paste the code above to make it work.

DynamicHeaderListHow does this Code do that?

Let’s explain how it does what it does. First, we need to extract the column headers from each table and in order to do that we create a new column using the Table.ColumnNames on the column named Custom that holds all the tables.

ColumnHeaders

The result of that will look like this:

ColumnHeadersList

and what we need to do now is simply expand that ColumnHeaders column using the icon next to the name of that column that has the opposite arrows icon and the result of that will look like this:

RemoveDuplicatesCH

as you can see on the previous image, the expand action basically created a list of all the columns found on each table. so in the CSV File.csv we found 8 columns headers and what we want to do now is simply create a unique list of ColumnHeaders.

In order to do so, we just select the column that holds the values that we want to make unique and click on he Remove Duplicates icon found in the Home tab.

The result of that will look like this:

DistinctNamesa nd what we need to do now is remove all the other columns so we only get a 1 column table that holds the unique values of ColumnNames. We select the columns that we want to deleted and select remove columns or, as shown in the picture below, we select the columns that we want to keep and click where it says Remove Other Columns.

RemoveOtherColumnsCH

the result of that operation should look like this:

UniqueListofCH

and now that we have this 1 column table with the unique values for the ColumnNames, our job is to transform that table into a List so we can use that List as an argument for a function later on. That’s why we use the Table.ToList function but the only way right now to do that is by creating a custom step (hitting the fx icon) and inserting it manually and that’s basically what the whole code does.

The Final Steps

The 2 final steps:

  • Here we go
  • Expanded

The Here we go step simply does some renaming on the columns by just double clicking the name of those columns and the Expanded just simply expands the Custom column by simply clicking on the opposite arrows icon next to the name of that column.

The resulf of this solution / query should look like this:

FinalStep

Load To

all you have to do now is select where you want to load this either a Worksheet or your Data Model

Power Query Patterns
Power Query Book
Power Query Workshop

this work was shared 0 times
 700
About

 Power Query Training

  (13 articles)

We are a team committed to delivering the best content for new and seasoned users of a tool that we believe in: Power Query. We firmly believe that this is THE best tool to be added to Microsoft Excel in a long time.

  • Got the book and watching the youtube. But I am not getting the append function when using XML, yes i know XML smells but its what I am stuck using. i have created my own function and a custom column to use in the query but how do I actually get an append to stop a combined binaries error?

    • First you’ll need to extract the xml data that you want to combine and set it up as a table. This video might help: https://www.youtube.com/watch?v=XPWnPAhQEiQ
      The main idea here is to get all the tables that you want to combine in just 1 column or as part of the query and then use Table.Combine. For sure the video will help 🙂

  • JustinS

    Hey, Miguel! Do you have an example here or in the book in which the different files don’t have the exact same columns? Or that they’re in different orders?

    For example, say you had FileA.csv with columns Ship, Captain, and Total Crew, and FileB with columns ShipClass, Ship, Captain, Rank, and Total Crew. When I try to combine them, I end up getting data from different columns combined in the final combined version.

    In my real life situation, I have dozens of these files that have evolved over time, so manually transforming each one would be a bit tedious.

    • Justin, sorry for the late reply! I didn’t check this comments in time and my apologies for that.

      The video shown on the other comment might help you as well! 🙂

  • Mary Ma

    How to change the name of APPLIED STEPS to ” Almost there” ? It has impact on the formula of dynamic list?

    • Ken Puls, FCPA, FCMA

      You can right click the step name in the Applied Steps window. After that though, if you need to modify any M code, you’ll need to refer to the step as #”Almost There”. The #” ” syntax is required since there is a space in the name you’re choosing. (I.e. if you chose AlmostThere then you wouldn’t need to escape it with #” “

close

The Best Power Query Course in the World!


  • circle
    Best Content Worldwide
  • circle
    Top Level Support
  • circle
    Over 12 hours of content
  • circle
    More rewards!