Mistakes In DataVisualization: A Short Practical Review (1)

Aamir Ahmad Ansari
4 min readJan 29, 2022

Data Visualization

The brain is more reactive to visuals rather than simple plain text. This is why you remember a movie plot that you’ve seen better than if you’ve read it( with me it is like that). Here we will learn about the uses and importance of visuals as well as an approach to make them spot on. Visualization dates back to the time of Babylon. Michael Florent van Langren, a Flemish astronomer is touted to present the first visualization of tabular data in 1664[1], which looks like this:

Figure1: First Tabular Plot

The data present today is immense and diverse and hence to convey and let people understand it in a nutshell, we need a precise and informative visualization. Without any further, a due let's get started!

Mistakes in Data Visualization

In my experience, I always wanted to plot something really fancy to visualize my data and convey information. I did a data modelling project(you can see it here) and created several visuals which I thought looked good and are very informative but now that I look back I understand the flaws in them and I will share those with you; practical learning is the best!

Figure 1

In figure 1, I have represented time-series data of covid-19 cases in India with the number of cured patients over time. The above graph has all the elements of a good graph:

a) Revealing title

b) Clearly Labeled Axes

c) Legends to represent which line is which.

It has two inherent flaws! One, the choice of the colours, I could hardly recognize that there were two lines at the first instance for a graph that I plotted. Two, can you tell me what is that monstrous coloured region, If you are familiar with statistics you will know that's a confidence interval but if you aren’t and I do not annotate what that is, you will get super confused, right? Hence always include the title, that gives a clear idea of what the graph is about. Axes, labels are very important so that reader knows about what kind of data is represented and what do they represent. Also, notice that I have not taken all the dates for which data is present but intervals. These are handy and makes your plots look clean. The plot contains information of uncertainty, which can be present in the data due to measurement errors or simply not revealing the true data. Is it relevant here? In this case, it is. up for discussion. I don’t think it is now. So rather than plotting it, we can give a side note or a footnote of the percentage of uncertainty. this will make our graph look cleaner and more understandable. If in your case it is relevant and you plot it, annotating it would be a good practice[3]. Though I did give a note in the notebook which says ‘Note the solid lines are the average of the observations at each step and the cluster around it is the 95% confidence Interval.’ but this was a good example.

Figure 2

In figure 2, if you excuse the lack of annotation of the CI and focus on other flaws, you’ll find that the label of the x-axes is a little confusing. You correlate time frame as a duration from a particular time unit to the other. For example, 5 days from now is a time frame, so a more appropriate label would be ‘Month of the Year’.

Colours

It is a very important aspect of your graph and hence I wanted to discuss it separately.

Figure 3[4]

The above graph is a comparison between ratings of some teams but it is made complex by the choice of the colours used as a reader will perceive that each colour also represents some kind of relationship whilst there is nothing that is represented by them. Second, even if it represented categorical data it is very inefficient as a clear distinction between categories cannot be made. Also, these colours will be ineffective if you present your data to a colour-blind person. Using perpetually uniform colour palettes which differ in chroma, luminescence and hue. For a continuous variable, we will use sequential gradients differing in the former three choices and for discrete variables, we’ll use equally spaced colours in palette depending on these three properties. Keep in mind these three properties and make an appropriate choice of colours that represent your graph.

References

[1] Data Visualization: History and origins: https://thinkinsights.net/digital/data-visualization-history/

[2] Visualizations That Really Work: https://hbr.org/2016/06/visualizations-that-really-work

[3] Uncertainty and graphicacy How should statisticians, journalists, and designers reveal uncertainty in graphics for public consumption? https://ec.europa.eu/eurostat/cros/powerfromstatistics/OR/PfS-OutlookReport-Cairo.pdf

[4] How to Choose Colors for Data Visualizations,https://chartio.com/learn/charts/how-to-choose-colors-data-visualization/

--

--

Aamir Ahmad Ansari

Sharing knowledge is gaining knowledge. Data Science Enthusiast and Master AI & ML fellow @ Univ.Ai