At the beginning of November I have accepted a job offer from Gambit Research. They’re a gambling company - placing sports bets algorithmically to make a profit. To be completely honest, it’s not one of the things I’m generally interested in but it was good to find out about these markets. I think some (most?) of the principles can be applied to finance and trading stocks.
The biggest gotcha of this job was to commute from Southampton to London every day. It is for this reason that I left. I was commuting by coach mostly for financial reasons but also for the convenience of where the stops are. Using a train might have made things slightly better but not too much if you take into consideration the distance to the train station.
I used National Express. In a nutshell, the average delay at the destination was 17 minutes. The average delay at departure, which is also important, was 19 minutes. This should, of course, be computed per station.
I logged most of the journeys, and here are some statistics:
|Destination delay||Arrival delay|
|OUT||13 min||11 min||17 min||8 min|
|RTN||27 min||11 min||17 min||11 min|
The departure standard deviations are pushed to 11 minutes because the maximums are high (38 minutes OUT and 45 minutes RTN). If those maximums were ignored, then the deviations would be 3 minutes and 8 minutes, respectively. If I had measurements for, say, 6 months and the results were similar it would be clear that the timetable needs adjustments.
Timetable adjustment problem
This is a too small dataset (6 OUT and 7 RTN journeys) to draw any conclusions. Even if I have had logged all the buses I’ve been on it would not be enough. What got my attention was the consistency of the departure delay. It makes me wonder if this kind of data is systematically logged and periodically analysed, or logged and analysed at all.
Traffic, events, temporary road closures, improvement works, accidents, overcrowding are all sources of delays. In other words, there is plenty of randomness in predicting actual bus times. There might be more challenges in actually improving the timetable but given enough data this might actually be a relatively simple task. It is better to rarely arrive early and wait than constantly be late and make passengers late to their destinations or wait for too long for the bus.
Transport data rambling
Some of the data should be available if National Express keeps a log of their coach tracker. However, it has periods when it goes down for various reasons (when I tried it was due to EE network issues). Also there are probably lots of variables in predicting the actual bus times, as it depends mostly on traffic. One-off incidents might introduce a certain level of noise, at least until enough data is aggregated.
From the exposure I had to the railway data, and the problems railway and railway related companies are currently facing, I think coach companies might have the same kind of issues. Also because they are much less centralised, I assume there is no (or much less) external interest in solving these problems. The issues I’m talking about are mostly easy fixes for tech people - putting together a few data sources, logging information, making information more easily accessible, and finally analysing the information to gain valuable insights. **
For trains you can access lots of data with ease*, and for free. There is no such thing for coaches. Likely because companies are not willing to share history delay data with the public as they think it will affect their image. It might, but it will also add benefits if this data will be ever put to some use.
* It takes lots of time to get the datafeeds into something useful, but the data is there, which is certainly the first big step.
** It’s harder than it sounds, but certainly doable; there are some obstacles along the way.