Friday, May 9, 2014

Big Data Week Toronto 2014 Recap - Meetup #3: Big Data Visualization

This past week was Big Data Week for those of you that don't know, a week of talks and events held worldwide to "unite the global data communities through series of events and meetups".

Viafoura put on the events this year for Toronto and was kind enough to extend an invitation to myself to be one of the speakers talking on data visualization and how that relates to all this "Big Data" stuff.

Paul spoke detecting fraud online using visualization and data science techniques. Something I often think about when presenting is how to make your message clear and connect with both the least technical people in the audience (who, quite often, have attended strictly out of curiosity) and the most knowledgeable and technically-minded people present.

I was really impressed with Paul's visual explanation of the Jaccard coefficient. Not everyone understands set theory, however almost everyone will understand a Venn diagram if you put it in front of them.

So to explain the Jaccard index as a measure of mutual information when giving a presentation, which is better? You could put the definition up on a slide:

 J(A,B) = {{|A \cap B|}\over{|A \cup B|}}.

which is fine for the mathematically-minded in your audience but would probably lose a lot of others. Instead, you could use a visualization like this figure Paul included:

The two depict the same quantity, but the latter is far more accessible to a wide audience. Great stuff.

I spoke on "Practical Visualizations for Visualizing Big Data" which included some fundamentals (thinking about data and perception in visualization / visual encoding) and the challenges the three "V"s of Big Data present when doing visualization and analysis, and some thoughts on how to address them.

This prompted some interesting discussions afterward, I found most people were much more interested in the fundamentals part - how to do visualization effectively, what constitutes a visualization, and the perceptional elements of dataviz and less on the data science aspects of the talk.

Overall it was a great evening and I was happy to get up and talk visualization again. Thanks to the guys from Viafoura for putting this on and inviting me, and to the folks at the Ryerson DMZ for hosting.

Mini-gallery culled from Twitter below: