From Digits to Decisions: Mastering the Art of Data-Derived Wisdom
Author: Patrick Bangert – SVP, Data, Analytics and AI, Searce
“Your innovation capability will increase by 21% if you do this,” the analyst said. To which I responded, “That sounds great but is that worth it, and by what units is innovation measured?” Some consternation and two days later, it turned out that the figure of 21% was based on a survey where 21% of respondents indicated their belief that, if this action was taken, the innovation capacity of their company would increase, by an unspecified amount. The conclusion is significantly less definitive than how it was presented.
Such conversations occur daily in enterprises across the globe. Analytics results are presented and a conclusion for action is proposed as if the conclusion naturally follows from the established scientific consensus, which is beyond question as a numerical value suggests. Most analytics reports do not objectively present the world as it is but attempt to dictate your conclusions.
In consuming analytics, whether presented in a report or during a presentation, it is incumbent on the audience to think critically about where the data came from, how the analysis was performed, what the numbers mean, and whether the proposed conclusion follows from the analytics. This constitutes data literacy and we propose that this is missing in executive education.
What you see is what you want
“It will cost $1m.” One might say, “Only that much? That’s great.” Someone else may reply, “That’s way too much.” What we see when we look at statistics is heavily colored by our assumptions and what we want to see.
There is no such thing as objective analytics. Data itself might be somewhat objective but it is too large to present in raw form – analytics is essential for compressing data into a format that can be interpreted and potentially lead to insights. As many of us learned from Darrel Huff’s 1973 classic book “How to Lie with Statistics,” analytics always come with an agenda or a bias. Passive consumers of analytics are thus in danger of placidly accepting, in the best case, the conscious agenda and, in the worst case, the unconscious prejudice of the analyst.
The choice of metrics, or key performance indicators (KPI), is a starting point for the analytics report. If, for example, we report on revenue and neglect cost, a positive picture might emerge where it is dire due to lossy deals being struck to boost the precise KPI being looked at, namely revenue. Naturally, we want a pretty picture and so we act in ways to make our dashboards or analytics look good. As such, analysts will choose the analytics that most agree with the desired narrative. The data-driven decision being taken by the board then has less to do with the data and more to do with how it was presented.
Something similar happened to Wells Fargo Bank in 2016 when employees opened many unauthorized bank accounts to achieve aggressive sales targets. Unethical behavior leads to achieving the KPI but ultimately leads to disaster for the business.
A Quality Basis
Analytics is rooted in raw data. To understand the analytics, it is often helpful to understand the data first. This does not have to be a lengthy process and often a single presentation slide is enough to collect the salient information such as the number of entries, how recently they were collected, what items per entry (columns) are there, the accuracy of the entries, and how balanced the data is. The balance of the data, by the way, refers to the distribution of entries across different groupings. If, for example, the data relates to people, we would want to know how many entries came from different genders or ethnicities, and so on.
Data is expected to be clean and must be cleansed if it is not clean. Inaccurate or incomplete records must be treated and duplicate records should be removed. Often this is harder than it appears.
In any evolving situation, we should also ask how the company plans to keep this data accurate, representative, complete, and trustworthy. The storage of the data should be secure to prevent unauthorized access and compliance with privacy and other regulations must be maintained and documented. Processes around obtaining, storing, modifying, and removing data must be defined and followed. All of these fields constitute data governance and represent essential work for the data stewards.
A study was conducted in 2011 to determine the US unemployment rate by scouring social media and looking for keywords like “unemployment”, “jobs” and “classifieds.” The correlation to other sources of unemployment data seemed reasonable until a huge spike occurred. It took a long time to realize that what had happened was that Steve Jobs died. The word “jobs” thus occurred a lot more often and had nothing to do with the unemployment rate.
Data-Driven Decision Making
“Ice cream causes drownings,” an analyst claimed based on finding that there is a statistically significant correlation between swimming pool drowning rates and ice cream sales. While the correlation exists, it is a spurious correlation because these two variables are not connected by any mechanism. They are both the result of warm summertime weather that causes a desire for swimming and ice cream. The drownings are merely a consequence of many more people swimming.
Distinguishing between genuine and spurious correlations or causal relationships over time can be difficult in practice but is very easy to claim through statistical analysis and charts. If accepted too readily, decisions to change something to achieve an effect elsewhere will be made. We might just outlaw ice cream in a desire to stop drownings. In the 1920’s, the USA made the same error with the prohibition of alcohol thinking that this law would make teetotalers of the population, which of course it did not.
If I can’t see you, you can’t see me
The absence of things is often as important as their presence. In statistical reports, we are often overwhelmed by charts and figures but important additional charts may be missing in the report and those are difficult to spot without deep understanding and critical thinking.
For example, in 2015 the US Congress accused Planned Parenthood of irregularities and presented a chart showing an increase in abortions and a decrease in cancer screenings. The graphic showed both as equally large, even though the change in cancer screenings was 25 times as much. How did they do this? They simply skipped a vertical axis in their plot and drew simple arrows – completely misleading the audience.
Similarly, in a popular graph of global air temperature change, you can easily demonstrate that climate change is real, not real, or has no significant impact, based on the year in which you choose to start your timeline. If you start early enough, before the mid-1990s, it’s clear that climate change is real. If you start in 1998 and end in 2012, the change is almost negative taking into account some measurement uncertainties.
Knowledge and Training
A Forrester study concludes that 87% of employees believe that data skills are important for their jobs but only 40% feel that they have these skills. Similarly, an Accenture study found that 75% of executives believe that their staff can work proficiently with data; only 21% of their staff shared this confidence.
While working with data and communicating data insights and analyses are important, we believe that reading data analytics reports is a critical skill as well. Many more people consume reports rather than create them and an increasing number of management decisions are made based on consuming analytics reports.
Choosing the right metrics, ensuring high-quality source data, reading the charts and understanding their implications, critically thinking about what the analytics say, injecting domain knowledge to examine whether the implied correlations and causations are genuine, and communicating all this in a way that others can understand the decision are the important aspects of data literacy. The most important audience for data literacy is the C-Suite as the pivotal decisions are taken here.