When working with Import Mode in Power BI, you should look at the size of your data model and make sure that it’s not bigger than necessary. There are many techniques you can apply to optimize your model, but there are two extremely simple transformation steps, that have the biggest impact on your model size. You can either reduce the number of rows (by filtering for example) or reduce the number of columns by removing them from dataset.
While this is not article about details of Power BI’s storage engine, it is important to know that Power BI stores the data column by column instead of row by row. That is why number of columns is more critical for the model size than the number of rows.
That’s why you should definitely consider removing unnecessary columns from the model. Today, I’ll walk you through the available options in the Power Query UI and make a case for why each is important.
Scope of the article
Before we start
Closer look at available functions
Choose Columns
- List of all columns available at that stage.
- Select All Columns – you can have all columns selected and unselect a few if needed or unselect all of them and select a few. Either way, this function allows you to choose columns as quick as possible.
- Search Bar – when you have a lot of columns it makes it very easy to find columns you want to keep.
- Sort Columns – very often overlooked feature. Not only you can select the columns you need but also sort them alphabetically. It helps to work in Data Preview in case you have a lot of columns.
Of course, the result of the function is that it keeps the columns that are selected. For the purpose of further discussion, let’s have a look at the code generated for this step in Formula Bar:
Remove Columns
Remove Other Columns
There is one interesting behavior while using Remove Other Columns functions, that could prove to be useful to you. When you are selecting columns to keep, you can click on them in an order that you would like to see them. Power Query is smart enough to understand what you are doing, and once your columns are selected, they are also sorted in the order of your clicks:
Maybe this is not revolutionary feature, but I am pretty sure you can make good use of it, and not only reduce the model size but also organize the columns to make them easier to work with, all in a single transformation step.
Explaining the title
- Choose Columns and Remove Other Columns – calls Table.SelectColumns function accepting two arguments. First is the table that function will work on. The name you see here is the name of the last step in Applied Steps list (the result of transformation step is a table in a given stage of transformation). Second argument is a list of columns that you want to keep.
- Remove Columns – calls Table.RemoveColumns function. It also accepts two arguments. First again is the table like discussed above. Second argument is the list of columns that you want to remove.
Which one to choose and when
Use Choose Columns / Remove Other Columns at the beginning of transformations. Especially when you work with less structured data like excel files, maybe these manual exports from transactional systems, done by multiple people and you can’t guarantee that all the files contain the same set of columns. If you use the Remove Columns step here, and you hardcode the names of columns that are needed, in case one of the columns is missing, you will get an error. Why would anyone like to be bothered by the fact that the column you don’t need is missing? On the other hand, if you select the columns you need, because these are columns needed for the Semantic Model, or for further transformations in Power Query, you would like to know that something has changed in data source, and at least one of the columns is no longer available.