intro to happy — how we approach teaching (and learning) power pivot & power query

power pivot first or power query first

On day one of a typical two-day training we start our students off with a nice, clean dataset and teach them the basics of loading data into Power Pivot, building a data model, writing some DAX, making a pivot table report. We don’t start with Power Query but inevitably during day one someone makes a comment that their data isn’t very clean and probably couldn’t be loaded right into Power Pivot. Let’s be real, this fictitious person is most of us. Most of us aren’t working off clean data sources. It’s a common experience to have messy data in Excel, CSV, & XML files that doesn’t fit into Power Pivot. And even if we have a better data source, like a data warehouse, there’s still usually some transformation that needs to happen to get our data in good shape for analysis. So if our data need some cleaning to any degree then we need Power Query before we can make much progress with Power Pivot.

If it’s the case that Power Query is a prerequisite for Power Pivot, then why don’t we teach you Power Query first in order to prepare you for Power Pivot? I’ve got the answer for that. It’s the answer we’ve been telling ourselves for a while now — Power Query doesn’t really make sense until you understand the shape your data needs to be in to build a good data model in Power Pivot. Said another way, once you’ve come to appreciate what the significance of the Power Pivot data model then you’ve got the right motivation to learn Power Query. Even as I’m writing this I’m wondering if that is still the right approach.

From out in the field …

In the past two weeks I’ve been in two trainings where we’ve broken protocol and introduced Power Query very early in day one. In both of these trainings we were working with the client’s data as opposed to a canned Adventure Works database, and in both of these trainings the client’s data was really messy right from the beginning. Instead of leading with Power Pivot, we led with Power Query. The training went well but it was more difficult than it had to be and here’s why:

Whenever I’m communicating a new concept, I prefer to start with why.

The why of Power Query is “You need to use this tool to transform your raw data into the shape that the Power Pivot data model likes, in the shape of flat files, with data tables and lookup tables.” Now this statement really makes zero sense if you’re an Excel person just being introduced to these tools. There’s a lot of new information packed into this why — importantly there’s this idea of a data model which to a Power Pivot beginner is an enigma wrapped in a mystery. So without having introduced the data model, teaching Power Query comes with this disclaimer “Trust me, all this data cleaning and transformation is what you need to be doing. You’ll see later why it’s important.” Which is basically some version of “Because I said so” which has been my least favorite reason for doing anything since 1992 when my mother told me I couldn’t watch the Simpsons.

The why of Power Pivot, on the other hand, is different — it’s not abstract. For the Excel user it’s the experienced pain of 1) not being able to refresh data, 2) writing VLOOKUP or INDEX/MATCH over and over again, 3) not being able to deal with large datasets, 4) making reports that are very difficult to maintain/edit, 5) struggling to create ratios that aggregate well, 6) dealing with data from multiple data sources, on and on. There are lots of constraints in old Excel and when you show Excel people the solutions in Power Pivot you can see the “why” on their faces. People light up. Lightbulbs go off. And when people understand Power Pivot they’re ready to do whatever it takes to get that raw data into the Power Pivot data model.

start with why

So starting with Power Query, before any Power Pivot, means starting withouta tangible why for most Excel users. But there’s a middle way — teach as much Power Pivot as needed to first establish the why. That may be as much as showing somebody a series of pivot tables with slicers and sparklines that get them excited. For others it may be showing them some time intelligence function like “YoY % Change in Net Income”. It may be showing them Power BI on a phone. For many it may be more in-depth than that. Either way, the best learning happens when people are motivated and so my first motion as a trainer is always to establish the why.

Just as a reference point — we start all of our training off with this slide …

Austin is VP of Operations & Growth at PowerPivotPro — check us out http://www.powerpivotpro.com

3 things that used to suck in old excel that you can now easily do with power query

I’d like to dedicate this article to Joe Dambrino who in 2012 was the first person to explain to me and show me the beauty of working with flat files. We didn’t have Power Query then but we do now!

I was introducing Power Query to a client this morning and found myself saying things that needed to be shared with a broader audience. That client was paying my full rate, you’re benefiting for free. You’re welcome!

  1. Merging datasets together
  2. Breaking a dataset into its component parts / Deduplication
  3. Unpivoting a dataset

For folks coming from a tech background these activities have NEVER been hard — you have tools like SQL that can crank through this stuff. For the Excel crowd these tasks have NEVER been easy. But Excel folks are hackers and they’ve figure out how to get these things done. For example, I was showing a client “unpivoting” this morning and he said “let me show you a trick” which then proceeded into a set of three-button keyboard shortcuts, cascading menus, and non-obvious button clicks that did in fact unpivot the dataset. It was cool in the same way that these household hacks are cool …

 

50% “sweet!”+ 50% “wtf, srsly?!” Most Excel hacks share one common characteristic — they aren’t very automated, meaning you have to do that same arcane set of button clicks and keyboard shortcuts every time you deal with the data. Sure you feel bad ass for knowing that secret code in the same way you’d feel bad ass for biking the Swiss Alps with a fixed gear bike. I’m looking at the whole situation and thinking “Ouch!” So anyways, yes I was impressed with the hack but he was WAY more impressed when we walked through how to do it in Power Query.

I’m not going to spend much time explaining how to do these things — but if the community expresses some interest in a topic I could definitely write up a post 🙂

So I was saying … there are three things that are now easier with Power Query, here they are! Power Query encourages us “to be good to our future selves” meaning you spend some time building something the right way now so next time you do it you just have to click a button. Let’s get into it.


Merging datasets together

You have three tables: a products table, a a products category table, and a products subcategory table. You need to get all of this data into one table so that you can make sense of it. What’s your first move?

For the Excel crowd you’re probably going with VLOOKUP or INDEX/MATCH. Right now there’s somebody reading this who has a crazy hack to get this done. I know! I know! Calm down person with the crazy hack. I’m suggesting how 99% of Excel users would solve this problem. The problem with this standard VLOOKUP approach is that you start with the products table and then you have to write a formula for every column of data that you want to merge into it. And when a new column of data shows up or you realize you left one out, you write another formula.

The SQL people in the room all have their noses in the air and are like “LEFT JOIN, duh!” Look SQL folks, historically we Excel people haven’t had that LEFT JOIN luxury. We’re lowly data janitors just scrubbing these floors with this busted ass VLOOKUP mop. But no longer, PQ to the rescue! In Power Query you can smash two tables together based on a common column and load that merged table into a worksheet — it’s just a few button clicks.


Breaking a dataset into its component parts / Deduplication

Situation Number Two. You have a table that has all of your sales data (data, sales amount, customer, product) and you need to break it out into a table for sales, a table of unique customers, and a table of unique products. Alright Excel people, what’s your first move?

Really the best way to deal with this is to start copying and pasting. Sounds a little nutty but any hack that you use here is going to take a lot more time than doing the work manually. Wait what? You have a crazy hack for this?! 🙂 I’m sure somebody has figured it out, but moving on … You’re going to copy and paste the sales data into two extra tabs, remove the columns you don’t need for the products table / same for the customers table, and use the remove duplicates function to get a table of unique products/customers. Then you can go back to the sales table and remove the products/customers columns that you don’t need. Voila! When you get the data for next month you have to do this all over again and you better pay close attention so you don’t fat finger anything!

Once again our SQL people are just dying to introduce us noobs to the SELECT statement but Excel people don’t be intimidated with all this fancy SQL talk. In Power Query you can take the original table, break it out into its component tables, deduplicate the products/customers , and load all the results into a worksheet in Excel. When the new data comes in next month you just click the refresh button and you’re done. No need to write one line of code — it’s all just button clicking.


Unpivoting a dataset

Finally Situation Number Three. The shape of your data matters, and there are basically two approaches to shaping a dataset— really wide tables or really tall tables. Here’s what I mean …

This is a WIDE table, all the dates are in separate columns going off to the right
This is a TALL table, all the dates and sales amounts are in one column (I cut off lots of the rows)

The wide table is a better format for reading the data; it’s a more “human” format; it’s a reporting format. The tall table is a better format for a computer to deal with the data; it’s a better format to do analysis of the data with a pivot table and with Power Pivot. I’m not going to go too much into why the second way is better, so assuming you want option 2 how do you “unpivot” the wide table to re-shape it into the tall table?

There are definitely some good hacks for this and of course there’s VBA but once again for 90% of Excel people you’re probably going to start copying and pasting. It’s how I was doing it in 2011 — no judgment. It’s not an exaggeration to say that unpivoting is one button click in Power Query. For lots of Excel people if this is all that PQ did it would be a great tool, like something you would pay money for. But of course it does much more.

It still puts a smile on my face when a client shows me a three-button keyboard shortcut that brings up some menu that I’ve never seen. 🙂

To close it out — Power Query solves a number of headaches that us Excel people have just come to live with over the years. We’ve accepted our fate and come to idolize people that know all the crazy hacks. And yes cheers to all the crazy hacks. It still puts a smile on my face when a client shows me a three-button keyboard shortcut that brings up some menu that I’ve never seen. 🙂 But in some ways the era of hacks is coming to an end as Microsoft is delivering real solutions to these problems with Power Query, Power Pivot, & Power BI. Currently the best way to learn Power Query is to just dive in, and if you have any questions you know where to reach us.

Austin is VP of Operations & Growth at PowerPivotPro — check us out http://www.powerpivotpro.com