Monday, March 13, 2017

When to Use Sequential and Diverging Palettes

Introduction

I wanted to take some time to talk an about important rule for the use of colour in data visualization. 

The more I've worked in visualization, the more I have come to feel that one of the most overlooked and under-discussed facets (especially for novices) is the use of colour. A major pet peeve of mine, and a mistake I see all too often, is the use of a diverging palette instead of a sequential one or vice-versa. 

So what is the difference between a sequential and diverging palette, and when is it to correct to use each? The answer is one that arises very often in visualization: it all depends on the data, and what you're trying to show.

Sequential vs. Diverging Palettes

First of all, let's define what we are discussing here. 

Sequential Palettes
A sequential palette ranges between two colours (typically having one "main" colour) ranging from white or a lighter shade to a darker one, by varying one or more of the parameters in the HSV/HSL colour space (usually only saturation or value/luminosity, or both). 

For me, at least, varying hue is going between two very distinct colours and is usually not good practice if your data vary linearly, as it is much closer to a diverging palette which will discuss next. There are others reasons why this is bad visualization practice, and, of course, exceptions to this rule, which we will discuss later in the post.

A sequential palette (generated in R)

Diverging Palettes
In contrast to a sequential palette, a diverging palette ranges between three or more colours with the different colours being quite distinct (usually having different hues). 

While technically a diverging palette could have as many colours as you'd like in a (such as in the rainbow palette which is the default in some visualizations like in MATLAB), diverging palettes usually range only between two contrasting colours at either end with a neutral colour or white in the middle separating the two.

A diverging palette (generated in R)

When to Use Which

So now that we've defined the two different palette types of interest, when is it appropriate and inappropriate to use them?

The rule for the use of diverging palettes is very simple: they should only be used when there is a value of importance around which the data are to be compared.

This central value is typically zero, with negative values corresponding to one hue and positive the other, though this could also be done for any other value, for example, comparing numbers around a measure of central tendency or reference value.

A Simple Example
For example, looking at the Superstore dataset in Tableau, a visualizer might be tempted to make a map such as the one below, with colour encoding the number of sales in each city:


Here points on the map correspond to the cities and are sized by total number of sales and coloured by total sales in dollars. Looks good, right? The cities with the highest sales clearly stick out in the green against the dark red?

Well, yes, but do you see a problem? Look at the generated palette:


The scale ranges from the minimum sales in dollars ($4.21) to max (~$155K), so we cover the whole range of the data. But what about the midpoint? It's just the dead center point between the two, which doesn't correspond to anything meaningful in the data - so why would the hue change from red to green at that point?

This is a case which is better suited using a sequential palette, since all the values are positive and were not highlighting a meaningful value which the range of data falls around. A better choice would be a sequential palette, as below:


Here, the range is full covered and there is no midpoint, and the palette ranges from light green to dark. The extreme values still stand out in dark green, however there is no well-defined center where the hue arbitraily changes, so this is a better choice.

There are other ways we could improve this visualization's encoding of quantity as colour, for one, by using endpoints that would be more meaningful to business users instead of just the range of the data (say, $0 to $150K+), and another which we will discuss later.

Taking a look at the two palettes together, it's clearer which is a better choice for encoding the always positive value of the metric sales dollars across its range:


Going Further
Okay, so when would we want to use a diverging palette? As per the rule, if there was a meaningful midpoint or other important value you wanted to contrast the data around.

For example, in our Superstore data, sales dollars are always positive, but profit can be positive or negative, so it is appropriate to use a diverging palette in this case, with one hue corresponding to negative values and another to positive, and the neutral colour in the middle occurring at zero:


Here it is very clear which values fall at the extremes of the range, but also which are closer to the meaningful midpoint (zero): that one city in Montana is in the negative, and the others don't seem to be very profitable either; we can tell they are close to zero by how washed out their colours are.

Tableau is smart enough to know to set the midpoint at zero for our diverging palette. Again, you could tinker with the range to make the end-points more meaningful (e.g. round values), as well as varying the range: sometimes a symmetrical range for a diverging palette is easier to interpret from a numerical standpoint, though of course you have to keep in mind how perceptually this going to impact the salience of the colour values for the corresponding data.

So could we use a diverging palette for the always positive sales data? Sure. There just needs to be a point around which we are comparing the values. For example, I happen to know that the median sales per city over the time period in question is $495.82 - this would be a meaningful value to use for the midpoint of a diverging palette, and we can redo our original sales map as such:


No we have a better version of our original sales map, where here the cities coloured in red are below the median value per city, and those coloured in green are above. Much better!

But now something strange seems to be going on with the palette - what's that all about?

No Simple Answers
So what is going on with the palette in the last map from our example above? And what of my promise to discuss other ways the palette scaling can be improved, and of exceptions to the rule of not using differing hues in a continuous scale?


Well, the reason that the map looks good above but the scale looks wrong has to do with how the data are distributed: the distribution of sales by city is not normal, but follows a power law, with most of the data falling in the low end, so our palette looks the same when the colours are scaled linearly with the data:


One way to fix this is to transform the data by taking the log, and seeing that the resulting palette looks more like we'd expect:


Though of course now the range is between transformed values. It's interesting to not that in this case the midpoint comes out being nearly correct automatically (2.907 vs. log(495.82) ~= 2.695).

Further complicating all this is the fact that human perception of colour is not linear, but follows something like the Weber-Fenchner Law depending on the various properties. Robert Simmon writes on this in his excellent series of posts while he was at NASA which is definitely worth a read (and multiple re-reads).

There he also notes an exception to my statement that you shouldn't use continuous palettes with different hues, as sometimes even that can be appropriate, as he notes in the section on figure-ground when talking about earth surface temperature.

Conclusion

So there you have it. Once again: use diverging palettes only when there is a meaningful point around which you want to contrast the other values in your data.

Remember, it all depends on the data. What is the ideal palette for a given data set, and how should you choose it? That's not an easy question to answer, one always left up to the visualization practitioner, which only comes with the knowledge of proper visualization technique and the theoretical foundations that form it.

There are no right or wrong answers, only better or worse choices. It's all about the details.

References and Resources

Subtleties of Colour (by Robert Simmon)

Understanding Sequential and Diverging Palettes in Tableau

How to Choose Colours for Maps and Heatmaps

No comments:

Post a Comment