Hello Power Query fans. As promised in latest post, I am coming back with full article, presenting my take on very interesting Power Query challenge. You may navigate here to check full details of that challenge. Without further ado, let’s have some fun.
Presenting the challenge
As a part of the challenge, we receive an excel file, with following Table as a data source:
Table on the top represents the data source. Table at the bottom represents expected outcome. I will do my best to summarize the challenge in few bullet points:
- Result table must contain only unique Client IDs.
- Source table contains 3 Client IDs with more than 1 row of data.
- Orange color in source table marks columns where we have different data entries for the same Client ID. Green columns indicate that data entry is the same.
- In our result table, we must concatenate two different values appearing for the same Client ID. If values in the column are the same (green color), we must show the value only once.
- Bonus task: we should try to make our solution work for any column that is added/removed in our dataset.
I hope it’s clear what we need to do, so, let’s roll up our sleeves and get to work.
Connect to the data
Our data source resides in a table within excel file. For those who are using Power Query mainly with Power BI, there is a function that is useful in Excel only. We can use Excel.CurrentWorkbook() function, to display all the tables available in current workbook. Or simply use the UI, to start Power Query from selected table:
Data transformation
Believe it or not, but even with meeting the requirement, to have our query working for all the columns dynamically, we need only 4 steps to complete the task. Yes, this is not a mistake, but the power of Power Query.
We could leave assigning data types for now, and later use our dynamic function created in previous article, but here we could just change the type of our “static column” which is “Client ID”. The rest will be done automatically within one of next steps.
We know for sure, that Client ID is something that must define the “grain” of our data, thus will be the only “static column” in our query. We must have it always, for the rest, we will try to make our query dynamic. How can we perform operations on all our columns, without calling them by their names? In Power Query we can perform powerful transformation like “Unpivot”, without writing any code. However, you must be careful which option you are using here. For us, we will simply select the Client ID column, and use “Unpivot Other Columns” option:
As a result, we receive the following table:
When use this option as presented above, we will always receive only 3 columns (more if we decide to have more “static columns”). First column is our Client ID, which we wanted to keep. Next, we have two columns generated by Unpivot function. Attribute contains all the column names from previous schema, Values of course has all the values from these columns. These generic names can be easily changed in the formula bar, but it’s not that important.
Why it is important to select “static columns” and then use the “Unpivot Other Columns” option? As you can see in the formula bar above the unpivoted table, “Client ID” is the only column name that is “hardcoded”. The rest was transformed automatically. Meaning, if columns are added/removed, our query will still work fine. Now, we must somehow aggregate our data, so we get unique combination of “Client ID – Attribute” value pairs. Instead of jumping into solution, let’s build it.
If we hear about aggregation in Power Query, what is the first thing that pops up in our mind? It is a Group by function. Let’s use it:
Even if you are familiar with Power Query functions, you still might want to use the UI to generate some of the code for you. This is exactly what I am doing here. We are grouping by Client ID and Attribute, and want to aggregate the Values, but we don’t have any “combine” option here. When you play with functions in Power Query, it’s good to check different options, especially how they are built in formula bar. In screenshot you can see I highlighted “Max” option (“Min” is fine as well). This option is interesting, as it uses List function to aggregate the data:
This means, that aggregation function takes all the values in a column, treats it as a list of Values, and calls List.Max function to return only one value as a result. If we have a list of values, we might be able to combine all the values from the list instead? We can use List.Combine function here. List.Combine takes “lists as a list” for a parameter. Therefore, our [Value] column is a list of values, so we must wrap it with curly brackets (representing a list structure) and change our List.Max to List.Combine function:
Now we have a list of all the values, instead of only one value returned as a result of aggregation. We could use now the Text.Combine function, which takes a list of values as a parameter, and combines it using given delimiter. But, in our list of values we could technically have some nulls or duplicates. We could wrap our List.Combine function with List.RemoveNulls and List.Distinct to resolve both problems:
All is left is to wrap it with Text.Combine function, with comma as a delimiter:
Of course, we used 5 different functions within this step (Group by, Text.Combine and 3 List functions), but two of them were added just to avoid future problems. So, technically this solution could be a bit simpler. Still, what we have as a result is properly aggregated data, where different values are concatenated, and the same values appear only once. Mission is almost accomplished, as we must now return to previous schema, from before Unpivot operation. What is the reverse operation for Unpivot? It’s Pivot of course. Again, let’s select our “static column” which is Client ID, and use Pivot function from UI:
- Select two columns we want to Pivot (like in Pivot Tables in excel)
- Go to Transform tab
- Select “Pivot Column”
- Specify “Values” column
- Select “Don’t Aggregate”, as we already made use that there are only unique value pairs of Client ID and Attribute.
Once columns are pivoted, we are done with data transformations:
Below I am providing full code in Power Query for the entire solution:
let
// 1. Connect to data residing in Excel Workbook
Source = Excel.CurrentWorkbook(){[Name="tblData"]}[Content],
// 2. Data type is changed only for our "static column"
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Client ID#", type text}}),
// 3. I am performing Unpivoting, because then I can work on generic column names, to satisfy the requirement so the query works for any number of columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Client ID#"}, "Attribute", "Value"),
// 4. We must create a table where we are a unique combination of "Client ID - Attribute" value pairs.
// Text.Combine takes care of assigning "Text" data type to our "Value" columns
#"Grouped Rows" = Table.Group(
#"Unpivoted Other Columns",
{"Client ID#", "Attribute"},
{{"Value", each Text.Combine(List.Distinct(List.RemoveNulls([Value])), ","),type text}}
),
// 5. Pivoting values to bring back former columns
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Value")
in
#"Pivoted Column"
Conclusion
Even though the entire code is not very long, this was quite interesting challenge to solve. We managed to meet all the requirements in a very elegant way.
It’s important to mention, that solution like this one, where all new columns are accepted automatically, isn’t probably the best approach in real life scenarios. They are interesting for a challenge, but all the new data added to the report, should be added consciously. Still, we might want to use it to some extent. Pick up dynamically all columns with specific patter in naming, that could indicate that columns belong to the same “family” and could be further treated in the same way.
I hope it was interesting exercise to follow along. In case you have any questions, feel free to contact me, or leave a comment below.
Thank you for reading, and till next time 🙂