Tuesday, October 18, 2016

Why we might be reading the data wrong



Why we might be reading the data wrong

Among other things, the problem might lie with our inadequate understanding of the domain we are trying to map through data

Recently, we got into programmatic selling of ads on Cartoq.com through AdX. We knew from others’ experiences that this could boost our ad revenues significantly—according to some, by almost 3x.

Unlike direct sales or selling through other ad networks, you get to set your price on AdX, and that too dynamically. This brings in a lot more variables into play. And if you enjoy data analysis, which my team and I have certainly started doing, it can be a lot of fun.

But having fun does not always guarantee the right results. In the initial few weeks, try as we might, our revenue actually dropped—by almost 33%.

We were shocked. We had done so well over the past two years because we knew our Facebook and web analytics so well. Today, Cartoq generates over 100 million social post impressions and nearly 25 million ad impressions every month. 

We got there by digging deep into our website and social data, and then relentlessly fine-tuning our analysis. We knew this game well. So why were we not able to get AdX right?

In this article, I will take our recent experiences with programmatic ad selling to illustrate why we sometimes end up reading our data wrong.

Not knowing the domain well enough

This could be one of the most insidious reasons why we end up getting all the wrong results from our analysis—not really understanding the domain itself. Trying to then read the data about that activity is obviously going to lead us nowhere.

We got into programmatic selling of online ads, but didn’t really spend time to understand how a programmatic platform was different from other ad networks.

Just as we had been doing earlier, we believed the essential problem to solve was optimizing pricing and inventory variables such that we could maximize Price x Volume. In a way, that is all there is to it really. But in reality, on AdX, when we got our ad rates to go up (eCPM, or effective cost per thousand impressions), our fill rate (ad inventory that we sold) would fall. Whatever permutations of price and volume we tried, we couldn’t find that sweet spot that would improve the aggregate of the two outcomes.

We knew that in any marketplace, if you increase the price, it will affect the volume. But if you knew the right buttons to press, programmatic exchanges were known to improve ad revenue realizations quite a bit. What were we missing?

Well, as we eventually discovered, in programmatic buying and selling, it is all about optimization at a particular point in time. It’s a flow concept—you are not trying to improve a static set of variables.
We had to start thinking about pricing as a function of inventory available at that time, and what variables would be relevant in this dynamic context. That real-time interplay of new variables is what needed optimization. 

Once we caught on to this fact, we did indeed discover a completely new set of variables. We had to rework our analytics reports completely because the questions we were now asking from our data had changed completely.

Within a week, we got our eCPM up to Rs125, which was close to the premium rates we were getting in direct deals with brands. And yes, we did increase our revenue by 2x because we also increased our inventory sold by almost 80%.

So if our data is not delivering results, chances are the problem lies in our inadequate understanding of the business.

Be sceptical about sharp upward movements in data

This one is more to do with the confirmation bias with which we all tend to operate. And I have found this bias kicks in quite strongly when we encounter a sudden improvement in our data.

This big uptick is the validation that we wait for, and when that happens, we latch on to this data a little too enthusiastically. Eventually, this leads to erroneous conclusions and decisions down the line.

On AdX, we often see a sudden jump in ad revenue, which may continue for a few days or even more. Our first thought when this happens is that our current optimization technique is working better now. So we start gathering more data to corroborate this thinking. This insidious validation continues—until the spike flattens out just as suddenly. But we didn’t change anything, so why did the spurt not continue?

Faced with this one-off event, we then start looking for some ad hoc factors that might have contributed. More often than not, it was because of some big budget brand launch campaign, or competing brand promotions vying for more digital inventory. The temporary spike in demand was the real reason for the price increase.

So we now have this rule of thumb: whenever we see a sudden jump in ad revenue, we assume it is a temporary event. We then go about looking for the data that will support this assumption.

What helps in such a situation is extensive disaggregated analysis. We dissect each variable, relevant or seemingly irrelevant; we then slice that data in multiple new ways until we are able to isolate, with a fair degree of confidence, what caused the spike.

In our experience, sudden improvements in performance don’t sustain. So it is prudent to treat sharp changes with healthy scepticism.

Correlation not the same as causation

This is a classic error, and I don’t think we have found any fool-proof way of avoiding this mistake. So, I will just share a couple of things we try to follow to mitigate this error.

One, whenever we think we found a reason to explain a particular piece of data, we try to keep that conclusion on hold for a while. We just tell ourselves, this could be mere correlation, so let’s wait for a few more days before we ascribe it as a cause. What this does is create that little space in our head to revisit our conclusions.

As a second step, we then get someone in the team to dig out all possible anomalies around that analysis. Assigning someone formally to play the devil’s advocate helps.

It’s something we try to do when there is some important decision that hinges around this conclusion. Having said this, in most cases, we haven’t really ended up changing our original conclusions. That makes me suspect that the problem of correlation and causation in data is more insidious than we perhaps imagine.

What works for us is building this awareness about all the ways we could be reading our data wrong. And in that knowledge lies the clue to mitigating those errors.


Regards

Pralhad Jadhav
Senior Manager @ Library
Khaitan & Co

No comments:

Post a Comment