Saturday, September 14, 2013

Analysis of the TTC Open Data - Ridership & Revenue 2009-2012


I would say that the relationship between the citizens of Toronto and public transit is a complicated one. Some people love it. Other people hate it and can't stop complaining about how bad it is. The TTC want to raise fare prices. Or they don't. It's complicated.

I personally can't say anything negative about the TTC. Running a business is difficult, and managing a complicated beast like Toronto's public system (and trying to keep it profitable while keeping customers happy) cannot be easy. So I feel for them. 

I rely extensively on public transit - in fact, I used to ride it every day to get to work. All things considered, for what you're paying, this way of getting around the city is a hell of a good deal (if you ask me) compared to the insanity that is driving in Toronto.

The TTC's ridership and revenue figures are available as part of the (awesome) Toronto Open Data initiative for accountability and transparency. As I noted previously, I think the business of keeping track of things like how many people ride public transit every day must be a difficult one, so you have to appreciate having this data, even if it is likely more of an approximation and is in a highly summarized format.

There are larger sources of open data related to the TTC which would probably be a lot cooler to work with (as my acquaintance Mr. Branigan has done) but things have been busy at work lately, so we'll stick to this little analysis exercise.


The data set comprises numbers for: average weekly ridership (in 000's), annual ridership (peak and off-peak), monthly & budgeted monthly ridership (in 000's), and monthly revenue, actual and budgeted (in millions $). More info here [XLS doc].


First we consider the simplest data and that is the peak and off-peak ridership. Looking at this simple line-graph you can see that the off-peak ridership has increased more than peak ridership since 2009 - peak and off-peak ridership increasing by 4.59% and 12.78% respectively. Total ridership over the period has increased by 9.08%.

Below we plot the average weekday ridership by month. As you can see, this reflects the increasing demand on the TTC system we saw summarized yearly above. Unfortunately Google Docs doesn't have trendlines built-in like Excel (hint hint, Google), but unsurprisingly if you add a regression line the trend is highly significant ( > 99.9%) and the slope gives an increase of approximately 415 weekday passengers per month on average.

Next we come to the ridership by month. If you look at the plot over the period of time, you can see that there is a distinct periodic behavior:

Taking the monthly averages we can better see the periodicity - there are peaks in March, June & September, and a mini-peak in the last month of the year:

This is also present in both the revenue (as one would expect) and the monthly budget (which means that the TTC is aware of it). As to why this is the case, I can't immediately discern, though I am curious to know the answer. This is where it would be great to have some finer grained data (daily or hourly) or data related to geographic area or per station to look for interesting outliers and patterns.

Alternatively if we look at the monthly averages over the years of average weekday ridership (an average of averages, I am aware - but the best we can do given the data we have), you can see that there is a different periodic behavior, with a distinct downturn over the summer, reaching a low in August which then recovers in September to the maximum. This is interesting and I'm not exactly sure what to make of it, so I will do what I normally do which is attribute it to students.

Lastly, we come to the matter of the financials. As I said the monthly revenue and budget for the TTC follow the same periodic pattern as the ridership, and on the plus side, with increased ridership, there is increased revenue. Taking the arithmetic difference of the budgeted (targeted) revenue from actual, you can see that over time there is a decrease in this quantity:
Again if you do a linear regression this is highly significant ( > 99.9%). Does this mean that the TTC is becoming less profitable over time? Maybe. Or perhaps they are just getting better at setting their targets? I acknowledge that I'm not an economist, and what's been done here is likely a gross oversimplification of the financials of something as massive as the TTC.

That being said, the city itself acknowledges [warning - large PDF] that while the total cost per hour for an in-service transit vehicle has decreased, the operating cost has increased, which they attribute to increases in wages and fuel prices. Operating public transit is also more expensive here in TO than other cities in the province, apparently, because we have things like streetcars and the subway, whereas most other cities only have buses. Either way, as I said before, it's complicated.


I always enjoy working with open data and I definite appreciate the city's initiative to be more transparent and accountable by providing the data for public use.

This was an interesting little analysis and visualization exercise and some of the key points to take away are that, over the period in question:
  • Off-peak usage of the TTC is increasing at a greater rate than peak usage
  • Usage as a whole is increasing, with about 415 more weekday riders per month on average, and a growth of ~9% from 2009 - 2012
  • Periodic behavior in the actual ridership per month over the course of the year
  • Different periodicity in average weekday ridership per month, with a peak in September
It would be really interesting to investigate the patterns in the data in finer detail, which hopefully should be possible in the future if more granular time-series, geographic, and categorical data become available. I may also consider digging into some of the larger data sets, which have been used by others to produce beautiful visualizations such as this one.

I, for one, continue to appreciate the convenience of public transit here in Toronto and wish the folks running it the best of luck with their future initiatives.

1 comment:

  1. Using control charts to distinguish between "special cause" variation and "common cause" variation might be helpful in your analysis.
    Reference: Out of the Crisis
    Author: W. Edwards Deming