Why we might be reading the data wrong
Among other things, the problem might lie
with our inadequate understanding of the domain we are trying to map through
data
Recently, we got into programmatic selling of
ads on Cartoq.com through AdX. We knew from others’ experiences that this could
boost our ad revenues significantly—according to some, by almost 3x.
Unlike direct sales or selling through other
ad networks, you get to set your price on AdX, and that too dynamically. This
brings in a lot more variables into play. And if you enjoy data analysis, which
my team and I have certainly started doing, it can be a lot of fun.
But having fun does not always guarantee the
right results. In the initial few weeks, try as we might, our revenue actually
dropped—by almost 33%.
We were shocked. We had done so well over the
past two years because we knew our Facebook and web analytics so well. Today,
Cartoq generates over 100 million social post impressions and nearly 25 million
ad impressions every month.
We got there by digging deep into our website
and social data, and then relentlessly fine-tuning our analysis. We knew this
game well. So why were we not able to get AdX right?
In this article, I will take our recent
experiences with programmatic ad selling to illustrate why we sometimes end up
reading our data wrong.
Not knowing the domain well enough
This could be one of the most insidious
reasons why we end up getting all the wrong results from our analysis—not
really understanding the domain itself. Trying to then read the data about that
activity is obviously going to lead us nowhere.
We got into programmatic selling of online
ads, but didn’t really spend time to understand how a programmatic platform was
different from other ad networks.
Just as we had been doing earlier, we
believed the essential problem to solve was optimizing pricing and inventory
variables such that we could maximize Price x Volume. In a way, that is all
there is to it really. But in reality, on AdX, when we got our ad rates to go
up (eCPM, or effective cost per thousand impressions), our fill rate (ad
inventory that we sold) would fall. Whatever permutations of price and volume
we tried, we couldn’t find that sweet spot that would improve the aggregate of
the two outcomes.
We knew that in any marketplace, if you
increase the price, it will affect the volume. But if you knew the right
buttons to press, programmatic exchanges were known to improve ad revenue
realizations quite a bit. What were we missing?
Well, as we eventually discovered, in
programmatic buying and selling, it is all about optimization at a particular
point in time. It’s a flow concept—you are not trying to improve a static set
of variables.
We had to start thinking about pricing as a
function of inventory available at that time, and what variables would be
relevant in this dynamic context. That real-time interplay of new variables is
what needed optimization.
Once we caught on to this fact, we did indeed
discover a completely new set of variables. We had to rework our analytics
reports completely because the questions we were now asking from our data had
changed completely.
Within a week, we got our eCPM up to Rs125,
which was close to the premium rates we were getting in direct deals with
brands. And yes, we did increase our revenue by 2x because we also increased
our inventory sold by almost 80%.
So if our data is not delivering results,
chances are the problem lies in our inadequate understanding of the business.
Be sceptical about sharp upward movements in data
This one is more to do with the confirmation
bias with which we all tend to operate. And I have found this bias kicks in
quite strongly when we encounter a sudden improvement in our data.
This big uptick is the validation that we
wait for, and when that happens, we latch on to this data a little too
enthusiastically. Eventually, this leads to erroneous conclusions and decisions
down the line.
On AdX, we often see a sudden jump in ad
revenue, which may continue for a few days or even more. Our first thought when
this happens is that our current optimization technique is working better now.
So we start gathering more data to corroborate this thinking. This insidious
validation continues—until the spike flattens out just as suddenly. But we
didn’t change anything, so why did the spurt not continue?
Faced with this one-off event, we then start
looking for some ad hoc factors that might have contributed. More often than
not, it was because of some big budget brand launch campaign, or competing
brand promotions vying for more digital inventory. The temporary spike in
demand was the real reason for the price increase.
So we now have this rule of thumb: whenever
we see a sudden jump in ad revenue, we assume it is a temporary event. We then
go about looking for the data that will support this assumption.
What helps in such a situation is extensive
disaggregated analysis. We dissect each variable, relevant or seemingly
irrelevant; we then slice that data in multiple new ways until we are able to
isolate, with a fair degree of confidence, what caused the spike.
In our experience, sudden improvements in
performance don’t sustain. So it is prudent to treat sharp changes with healthy
scepticism.
Correlation not the same as causation
This is a classic error, and I don’t think we
have found any fool-proof way of avoiding this mistake. So, I will just share a
couple of things we try to follow to mitigate this error.
One, whenever we think we found a reason to
explain a particular piece of data, we try to keep that conclusion on hold for
a while. We just tell ourselves, this could be mere correlation, so let’s wait
for a few more days before we ascribe it as a cause. What this does is create
that little space in our head to revisit our conclusions.
As a second step, we then get someone in the
team to dig out all possible anomalies around that analysis. Assigning someone
formally to play the devil’s advocate helps.
It’s something we try to do when there is
some important decision that hinges around this conclusion. Having said this,
in most cases, we haven’t really ended up changing our original conclusions.
That makes me suspect that the problem of correlation and causation in data is
more insidious than we perhaps imagine.
What works for us is building this awareness
about all the ways we could be reading our data wrong. And in that knowledge
lies the clue to mitigating those errors.
Source | http://www.livemint.com/Opinion/7VmajB4pe0QZrOTttI0ZbM/Why-we-might-be-reading-the-data-wrong.html
Regards
Pralhad Jadhav
Senior Manager @ Library
Khaitan & Co
No comments:
Post a Comment