I’ve been mining the British Orienteering event results pages and have produced a websites presenting the results in a more effective way – i.e. athlete focused rather than event focused. I’m also having a go at recalculating the ranking score based on this data.

http://oobrien.com/stats/

Unfortunately there are a couple of flaws:

  • The BOF ID is not available on the source website, so I have had to construct a key based on name (which can be misspelled on results uploads from time-to-time) and club (ditto). This mainly works, except where people change club, in which case their results, run under other clubs, that contribute to their ranking score, won’t be included.
  • It turns out that, with each new result upload, all the ranking points for all events going back the whole of the last year – possibly more – are recalculated. This has the effect of old scores drifting slightly – I wasn’t expecting the points to fluctuate in such a way. The effect is mainly small – so far one of my scores has drifted by 1 point – but another person’s score has drifted by 7 points. I could mitigate this by scraping all results over the last year every night, but this would put strain on BOF’s servers and they would probably not appreciate it – it would be over 5000 page requests over the course of several hours. So, instead, I’m updating the most recent 25 events nightly and may manually resync the whole year on an ad-hoc basis. The result is that, after a while, the scores don’t match precisely with those on the source website.

The toughness scores for each event are just a bit of fun and based on the details of the course, not how well people did on it. The urban shading is also just based on the name of the event, rather than any specific metadata on the event that I am accessing. Such metadata may be available in the event details section of the source website but I am just using the results information here.

The collation of a large number of results has highlighted various data problems, such as results appearing as HH:MM rather than MM:SS, or x,xxx km instead of x.xxx km. Unfortunately one of my own (few) event result uploads suffered the first problem. This doesn’t affect the points at all, because the times within each course are only used on a relative, not absolute, basis, but it does preclude me, for example, totalling the “yearly run hours” for each athlete, without cleaning up the data on my side.

You can see the stats here – type in your name and club to see your stats. See the notes on the search page, e.g. most Level D events not included. You can also compare two people, looking at where they ran the same courses at the same event.

Join the Conversation

4 Comments

  1. Interesting stuff. I started out by being pretty depressed at apparently only having been to 9 events in 2011 and actually only having finished 7 of them then I read the part about Level D events. With kids in tow there’s where we spend most of our time these days… Perhaps I should set up an orienteering club for families. 😉

  2. Excellent. I’m currently monitoring NE juniors by keeping my own spreadsheet but this could transform that. The bottom line result for me would be % time behind the winner.

  3. I’ve just come back to this after an enforced lay-off from the sport, and see that it doesn’t seem to have pulled in any data in the last couple of years… Is this because of deliberate choice, some unknown failure, change in the source data breaking things…? It’s a shame, I used to enjoy looking at this. Maybe a fix is possible/not too onerous…?

    1. BOF changed their process, they now rescore every event in the last couple of years, every week. I can’t get this data without scraping every event therefore, which would take several hours and also cause a huge bandwidth spike to the the BOF website – they would likely not appreciate that.

Leave a comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.