Big bad data
As
much as big data offers us numerous benefits, we must remain keenly aware of
the harm that it can cause
Fredrick
Hoffman was a statistician employed by the Prudential Life Insurance company
tasked with uncovering risk patterns in medical data. So good was he at his job
that even though he had no medical training, he managed to uncover the harmful
side-effects of asbestos, identify silicosis as a real disease that was causing
fatalities among American workers and establish the causal relationship between
smoking tobacco and lung cancer. By the time he died in 1946, he had 28 books
and nearly 1,200 published articles to his credit.
Despite
his prolific output towards the later part of his life, he is best remembered
for his first book—The Race Traits and Tendencies of the American Negro.
Originally published in 1896, this is, arguably, the social science study that
most profoundly impacted turn-of-the-century American society.
Hoffman
conducted a detailed analysis of disease rates among freed slaves and concluded
black people, as a race, were sicker and more disease-prone than whites and
therefore were on a downward spiral to extinction. Time has shown that this
analysis was flawed but given the conviction with which he presented the data,
it was, at the time, all that was needed to render the entire African-American
community effectively uninsurable. As a result, sick African Americans were
unable to afford healthcare and got sicker—cruelly converting his flawed report
into a self-fulfilling prophecy.
We
now know that Hoffman’s mistake was in confusing causation with correlation.
Blinded by prejudice, he never stopped to think that it was poverty and
injustice, rather than race, that was to blame. But his faulty conclusion had
deep consequences, reinforcing a wrong that has inflicted lasting damage on an
entire community that is felt to this day.
The
problem with data is that it can be presented in myriad ways, and while
individual elements of data are immutable, in aggregate, a database can be
arranged to mean different things in different contexts. The Hoffman report is
a telling example of the harm that can be wrought by drawing false conclusions
from data sets. As we increase our reliance on data for decision-making, using
big data and machine learning to help us determine the appropriate level of
premiums we should pay on our insurance or our eligibility for a job, we would
do well to ensure that in our eagerness to become more scientific in our
decision-making, we don’t end up seeing only those patterns that we want to
see.
The
Crime and Criminal Network Tracking System is the Indian government’s attempt
at implementing a form of Predictive Policing— using big data to identify
potential criminals before they commit their crimes. The government plans to
connect the 14,000 police stations across the country in order to facilitate
rapid investigation and detection of crime. In doing so it will, for the first
time, correlate existing databases of criminal and history sheeters with geographical
information, first information reports and allied data in order to be able to
better anticipate criminal activity.
As
we start down this path of data-assisted crime prevention, we must be mindful
of the historical biases inherent in our criminal databases and ensure that
when they are trawled by machines, we don’t allow machines to institutionalize
our prejudice.
Take
for instance, the Pardhis, a denotified “criminal” tribe whose members are
routinely rounded up by the police whenever there is a crime in the area.
Members of the tribe populate criminal databases around the country—sometimes
just by virtue of belonging to that tribe. If our computers were to blindly
rely on these historical databases they could, much like Hoffman, reinforce
historical bias and force the community into persistent machine-determined
discrimination.
One
of the often-touted ancillary benefits of digital payments is the extent to
which this technology can revolutionise micro-lending. People who have, till
now, been unable to provide evidence of credit-worthiness will be able to
present a trail of their digital transactions, providing evidence of their
ability to repay. But as the machines begin to amass greater volumes of
transactional data, they will be able to build more specific personal profiles
of our behaviour, allowing them to take nuanced decisions and make subtle
discriminations between otherwise similarly situated individuals.
As
much as I appreciate the many benefits that data offers us, I remain acutely
conscious of the harm that it can cause. The current thinking is that because
big data and machine intelligence can solve so many of our most pressing
problems, we should focus on the good and not worry about the potential harms.
I worry that in rushing blindly after short-term gains we will end up
institutionalizing a new data-based caste system that will be that much harder
to unravel.
Source | Mint – The Wall Street Journal | 11 January 2017
Regards
Pralhad
Jadhav
Senior
Manager @ Library
Khaitan & Co
No comments:
Post a Comment