Saturday, March 29, 2014

Interactive Visualization: Explore the 2014 "Sunshine List"

The "Sunshine List", a list of Ontario's public service who make more than $100,000 a year, was brought in under the Harris government in 1996, in an effort to increase transparency about just how much the top paid public servants were earning.

Now the list is released each year by the Ontario Ministry of Finance.

However, there has been some frustration that the data are not in the most easily accessible format (HTML & PDF? Really guys?).

Stuart A. Thompson was kind enough to provide the data in an easier to digest format (Nick Ragaz also provided it in CSV), as well as producing a tool for exploring it on The Globe and Mail.

I thought it'd be great to get a more visual exploration of the data at fine granularity, so have produced this interactive visualization below.

You can filter the data by searching by employer, name, or position. You can also filter the list by selecting points or groups of points on the scatterplot on the left, and highlight groups of points (or individual employees) by selecting components in the bar graph at the right. The bar graph on the right can also be expanded and collapsed to view the aggregate salary and benefits by employer, or to view the quantities for individual employees.

Hovering over data points on either graph will display the related data in a tooltip - I've found this is handy for looking at individual points of interest on the scatterplot. Zoom and explore to find interesting patterns and individuals. Give it a try!



I've plotted benefit against salary with the latter having a logarithmic axis so that the data are easier visualized and explored (note that I am in no way suggesting that benefits are a function of salary).

Using combinations of all these possible interactions mentioned above you can do some interesting visual analysis: for instance, how do the top salaries and benefits earned by Constables across police departments in Ontario differ (seriously, take a look)? What are the relative pay and benefit levels of professors at Ontario Universities on the list? How much does Rob Ford make?

Something interesting I've already noticed is that for many employers there are long horizontal bands where employees' salaries vary but they are fixed into the same benefit buckets. Others have a different relationship, for example, the benefit / salary ratios of those at my alma mater vs those of employees of the City of Toronto:


Hope this tool will be interesting and useful for those interested in the list. As always feedback is welcome in the comments.

Thursday, March 27, 2014

Perception in Data Visualization - A Quick 7 Question Test

When most people think of data, they probably think of a dry, technical analysis, without a lot of creativity or freedom. Quite to the contrary, data visualization encompasses choices of design, creative freedom, and also (perhaps most interestingly) elements of cognitive psychology, particularly related to the science of visual perception and information processing.

If you read any good text on dataviz, like TufteFew, or Cairo, you will, at some point, come across a discussion of the cognitive aspects of data visualization (the latter two devoting entire chapters to this topic). This will likely include a discussion of the most elemental ways to encode information visually, and their respective accuracies when quantity is interpreted from them, usually referencing the work of Cleveland & McGill [PDF].


Mulling over the veracity of my brief mention of the visual ways of encoding quantity in my recent talk, and also recently re-reading Nathan Yau's discussion of the aforementioned paper, I got to thinking about just how different the accuracy of interpretation between the different encodings might be.

I am not a psychologist or qualitative researcher, but given the above quickly put together a simple test of 7 questions in Google Docs, to examine the accuracy of interpreting proportional quantities when encoded visually; and I humbly request the favour of your participation. If there are enough responses I will put together what analysis is possible in a future post (using the appropriate visualization techniques, of course).

Apologies in advance for the grade-school wording of the questions, but I wanted to be as clear as possible to ensure consistency in the results. Thanks so much in advance for contributing! Click below for the quiz:


EDIT: The quiz will now be up indefinitely on this page.

Saturday, March 8, 2014

colorRampPaletteAlpha() and addalpha() - helper functions for adding transparency to colors in R

colorRampPalette is a very useful function in R for creating colors vectors to use as the palette, or to pass as an argument to a plotting function; however, a weakness lies in that it disregards the alpha channel of the colors passed to it when creating the new vector.

I have also found that working with the alpha channel in R is not always the easiest, but is something that scientists and analysts may often have to do - when overplotting, for example.

To address this I've quickly written the helper functions addalpha and colorRampPaletteAlpha, the former which makes passing a scalar or vector to a vector of colors as the alpha channel easier, and the latter as a wrapper for colorRampPalette which preserves the alpha channel of the colors provided.

Using the two functions in combination it is easy to produce plots with variable transparency such as in the figure below:



The code is on github.

I've also written examples of usage, which includes the figure above.

# addalpha() and colorRampPaletteAlpha() usage examples
# Myles Harrison
# www.everydayanalytics.ca

library(MASS)
library(RColorBrewer)
# Source the colorRampAlpha file
source ('colorRampPaletteAlpha.R')

# addalpha()
# ----------
# scalars:
col1 <- "red"
col2 <- rgb(1,0,0)
addalpha(col2, 0.8)
addalpha(col2,0.8)

# scalar alpha with vector of colors:
col3 <- c("red", "green", "blue", "yellow")
addalpha(col3, 0.8)
plot(rnorm(1000), col=addalpha(brewer.pal(11,'RdYlGn'), 0.5), pch=16)

# alpha and colors vector:
alpha <- seq.int(0, 1, length.out=4)
addalpha(col3, alpha)

# Simple example
x <- seq.int(0, 2*pi, length=1000)
y <- sin(x)
plot(x, y, col=addalpha(rep("red", 1000), abs(sin(y))))

# with RColorBrewer
x <- seq.int(0, 1, length.out=100)
z <- outer(x,x)
c1 <- colorRampPalette(brewer.pal(11, 'Spectral'))(100)
c2 <- addalpha(c1,x)
par(mfrow=c(1,2))
image(x,x,z,col=c1)
image(x,x,z,col=c2)

# colorRampPaletteAlpha()
# Create normally distributed data
x <- rnorm(1000)
y <- rnorm(1000)
k <- kde2d(x,y,n=250)

# Sample colors with alpha channel
col1 <- addalpha("red", 0.5)
col2 <-"green"
col3 <-addalpha("blue", 0.2)
cols <- c(col1,col2,col3)

# colorRampPalette ditches the alpha channel
# colorRampPaletteAlpha does not
cr1 <- colorRampPalette(cols)(32)
cr2 <- colorRampPaletteAlpha(cols, 32)

par(mfrow=c(1,2))
plot(x, y, pch=16, cex=0.3)
image(k$x,k$y,k$z,col=cr1, add=T)
plot(x, y, pch=16, cex=0.3)
image(k$x,k$y,k$z,col=cr2, add=T)

# Linear vs. spline interpolation
cr1 <- colorRampPaletteAlpha(cols, 32, interpolate='linear') # default
cr2 <- colorRampPaletteAlpha(cols, 32, interpolate='spline')
plot(x, y, pch=16, cex=0.3)
image(k$x,k$y,k$z,col=cr1, add=T)
plot(x, y, pch=16, cex=0.3)
image(k$x,k$y,k$z,col=cr2, add=T)

Hopefully other R programmers who work extensively with color and transparency will find these functions useful.