How much data do you really need?
Data
stewards continue to argue in favour of providing unrestrained access to users
to all the data—as the best means for gaining value from analysis
With
the advent of Big Data and the advancement of technology to store it, are we
justified in holding large quantities of it? What are the drawbacks or benefits
of doing so and how can organizations find the right balance?
More
than a decade back, a large telecom organization was discussing a strategy with
its outsourcing partner. The topic was: How much data should its data warehouse
hold—three, four or six months’ data? Holding more data meant more expenditure.
And even though it meant more business value, the strategic outsourcing partner
prevailed and the “usage and retention” team at the telecom operator had to
settle for three months.
How would this question be settled
today?
We
all have read multiple forecasts telling us how the data volume is growing at
an unprecedented pace. We have also heard about the three Vs of data (if not,
here they are—variety, velocity, volume) fuelling this growth. Digitization
continues to be one of the biggest drivers. The Internet of Things is another.
The list goes on. A variety of application areas continue to fuel consumption
of this data—customer management, product development, marketing spends, risk
management and the like. Data stewards continue to argue in favour of providing
unrestrained access to users to all the data—as the best means for gaining
value from analysis.
How
should organizations face this data deluge? Doesn’t storing more data mean more
cost? While hardware costs have been falling, the same cannot be said for data
costs. Unless you are using open-source software, your software costs are not
going down either. Moreover, there are people costs and other overheads. Then
again, what if you store all the data but don’t make use of it. Conversely, if
you don’t store the data, you miss out on those million-dollar insights. So how
does one solve this quandary?
Let’s
start with the easiest part—what data needs to be stored. The data you keep is
directly driven by the use case—if you want to understand customer behaviour,
you need to store transactions, profiles, previous purchases and the like. But
if you want to do profitability analytics, you will need finance, revenue and
cost data.
How
much you store is also driven to a certain extent by the industry you are in.
Data for statutory reporting like Basel-II, and fraud analysis will typically
require five-seven years of data, whereas customer cross-sell will require data
for one-three years. For most customer analytics, companies in the financial
services sector will typically store three years’ data, retail companies will
store for two-three years, while telecom service providers will look at less
than a year’s data.
The
next task is to dissect if you need to store unstructured data, for example,
Web logs, chats, text, social, voice, etc. If your industry is going the
digital way or you are focusing more on online, expect the answer to be “yes”.
Finally
bear in mind that not all data is created equal. Some data is accessed more
routinely than others. What portion of your data is expected to be accessed
very frequently, what is going to be accessed irregularly and what once in a
while? In the telecom industry, CDR or call data records are the lifeline of
the business. However, their usage can be starkly different. While the usage
and retention teams will ask for 90 days of CDR data at a high velocity for
multiple types of customer analysis, the statutory requirements team will need
to store more than five years’ data but access it infrequently, whereas the
revenue management team may want it for one-three years and keep churning it
moderately. Each industry will have its own scenarios of this example.
Based
on the answers to the above three questions, you can look at one of the
following scenarios to host your data environment—use general purpose
transactional database management systems, or DBMSs, (Oracle, MS SQL) along
with specialized analytics DBMSs (Teradata, IBM Netezza) and/or open-source platforms
(NoSQL DBMs/Hadoop file system).
If
your organization is moving towards digital transformation, it will be more
likely than not that you will have a loosely integrated mix of all of these as
your data environment— called the data lake. Welcome to the new data foundation
to support the digital organization.
Source | http://www.livemint.com/Opinion/yXNJvQVce5D6zEyky1GalN/How-much-data-do-you-really-need.html
** All my posts are dedicated to Sir Dr. S R Ranganathan
on occasion of his 125th Birth Anniversary
Regards
Pralhad
Jadhav
Senior
Manager @ Library
Khaitan
& Co
Upcoming
Events | BOSLA-NIFT
ANNUAL LECTURE SERIES-2016 on Saturday, 20th August 2016 at 10.00 hrs in National Institute of Fashion
Technology, Kharghar, Navi Mumbai.
Note | If anybody use these post for forwarding in any social media coverage
or covering in the Newsletter please give due credit to those who are taking
efforts for the same.
No comments:
Post a Comment