Wednesday, April 17, 2013

Problems of Measurement

The other day I was walking in the mall amongst the office buildings and I saw something that I thought was odd.

In the atrium, where the clear glass surrounded all and allowed the sunlight and view of the bustling city streets to trickle in, there sat a woman.

Even from far away I could tell that she looked weary – despondent even, from the slump of her shoulders. But the thing that worried me the most, that was so odd, was that unlike all the others on the platform, she sat. She sat on the ground with her legs dangling over the edge and her eyes staring out the window.  Her shoulders were slumped and she had a certain indifference to all that occurred around her.

As I got closer I saw she had a clipboard and immediately my concern dissipated. The clipboard sat on the floor to her right, and the paper on it had a table partially filled-in with scribbles of blue pen. In her right hand was a black thumb-counter, the kind security guards at concerts and bouncers at nightclubs used to count patrons, coming and going.

I strode past her, in a hurry as always, but finally my curiosity got the better of me and I turned on my heel.

“What are you counting?” I said, trying my best to be inquisitive in the friendliest way possible.

“People getting on and off the buses,” she said, without looking up.  I followed her gaze and saw that from where we were stood we had a direct view of stop 214 on the main street outside. There was a continual flurry of activity which she was responsible for recording – buses stopping, buses leaving and people getting off and on, almost constantly. It was a continual flow of humanity in transit.

Which of course got me thinking about problems of measurement.

As I watched the people mill off and on the buses I thought about the city transit department and their problems of measurement. Surely there must be a better way to track how many people got off and on these buses. Some sort of automated system - a motion sensor or card reader.

But the flow of people getting off and on was a continual stream – it was not a separate series of blips on a radar screen – which made me think that of course the problem was not that simple, otherwise it would have already been solved.

In analytics, a fact which I did not always appreciate is that before there are problems of data, and before there are problems of analysis, there are problems of measurement.

Before you set off to gain insight about the world and your chosen topic of study, you first have to stop and say “What is it we want to know?” which leads to the question, “What is it we need to measure?” and then finally, and sometimes most importantly: “How will we measure it?

In the world of web analytics this has a lot to do with what is known as implementation. You need to make sure all your evars and sprops are in a row, otherwise it’s going to be really hard to actually figure out what’s going on.

After all, if your measuring stick is a bathroom scale it’s very hard to figure out how tall people are.

And then I got to thinking about everyday analytics again. If you’re turning your analysis inward, and you want to learn about the thing that analysts are not paid the big bucks to do analysis on – you - you need to ask the same questions.

You have your own problems of measurement.

You need to start at the beginning and ask yourself – “What is it I want to know?

What percent of your income you spend on rent? How many coffees you drink in a month? How much weight have you lost on your diet?

Then the second question is – “What is it I need to measure?

Income and expenditures? Number of trips to Starbucks? Weight in pounds over time?

And now we come to the last question - “How will I measure it?

Income? Dollars on my pay stub. Number of trips to Starbucks? Self-explanatory. Weight in pounds over time?  On a bathroom scale.

However, I would argue that when we come to the last question now we should treat it differently. We should treat it differently because now this question can be the starting point. Because in the last question, for you, the “it” is your life.

It’s not a website. It’s not a startup. It’s not a product.

It’s your life. 

And it’s yours and yours alone. So you get to decide what is really important to you – and you’re the only one who can. You get to solve your own problems of measurement, and figure out how you are going to measure your life.

So regardless of whether you practice quantified self, or care about everyday analytics, or not, the one question I will leave you with is this - How will you measure your life?

Wednesday, April 3, 2013

Toronto Licensed Cats & Dogs 2012 Data Visualization

It's raining cats and dogs! No, I lied, it's not.

But I wanted to do so more data viz and work with some more open data.

So for this quick plot I present, Cat and Dog Licenses in the City of Toronto for 2012, visualized!

Above in the top pane is the number of licensed cats and dogs per postal code (or Forward Sortation Area, FSA). I really would like to have produced a filled map (chloropleth) with the different postal code areas, however Tableau unfortunately does not have Canadian postal code boundaries, just lat/lon and getting geographic data in is a bit of an arduous process.

I needed something to plot given that I just had counts of cat and dog licenses per FSA, so threw up a scatter and there is amazing correlation! Surprise, surprise - this is just the third variable, and I bet that if you found a map of (human) population density by postal code you'd see why the two quantities are so closely related. Or perhaps not - this is just my assumption - maybe some areas of the GTA are better about getting their pets licensed or have more cats and dogs. Interesting food for thought.

Above is the number of licenses per breed type. Note that the scale is logarithmic for both as the "hairs" (domestic shorthair, domestic mediumhair and domestic longhair) dominate for cats and I wanted to keep the two graphs consistent.

The graphs are searchable by keyword, try it out!

Also I find it shocking that the second most popular breed of dog was Shih Tzu and the fourth most type of cat was Siamese - really?


Toronto Licensed Cat & Dog Reports (at Toronto Open Data Portal)

Toronto Animal Services