Racing Database Redesign - Part 10 - Tables (Update)

So much for taking some time out from blogging to create the database! As soon as I started I noticed an issue with the Betfair data that would upset things slightly.

The VENUE table is, for all intents and purposes, pretty static in that the data it contains isn't going to change very much if at all. As a result, this table can be easily pre-populated as part of the design process using the information available from TimeForm and the spreadsheet file I flagged up in my last post.

If you look back to the ERD, you'll see that the VenueID is the key field. This identifier is not directly available from the downloadable Betfair results files but I wanted to use that as the key (i) because I assumed it was unique and (ii) I could use it within any future interface to call up the TimeForm race card by programically creating the URL.

The image below illustrates why I intended to use it. You'll see the CourseID of 1.9 in the URL which identifies Catterick Bridge.

Each course will have a corresponding CourseID and it would be a trivial process to retrieve that from the VENUE table and build the corresponding URL.

However, all is not as straight forward as it seems. There are a number of issues I discovered when populating the aforementioned spreadsheet file with these identifiers in preparation for importing to the database:

1. Timeform treats Newmarket as two separate courses, namely Newmarket (Rowley) and Newmarket (July) with CourseIDs of 1.36 and 1.60 respectively yet the downloadable results files, the aforementioned spreadsheet and as far as I can make out, API software all treat them as one using the short form name of 'Newm'. This then makes it difficult to marry the results files to the correct venue in the case of Newmarket.

2. After the shortlived Great Leighs course arose phoenix-like from the ashes with the new name of Chelmsford City, Timeform simply reassigned the same CourseID to the name of Chelmsford City. A reasonable thing to do you might think except, if you change the dates on the Timeform site to 8th Jan 2009 you'll see that a meeting that was held at Great Leighs was apparently held at Chelmsford City - long before such an entity existed. You and I know that they are one and the same location, but software doesn't.

While this may not necessarily be an issue in and of itself, the historical results files for the time that Great Leighs was operational refer to it's orginal name not its reincarnation. If this issue isn't taken care of, the importation of all those races at Great Leighs will fail.

3. It is not an operational issue but you'll also find the venue long name in the spreadsheet file differs from the long name used by Timeform. For example, Epsom v Epsom Downs; Fontwell v Fontwell Park; Stratford v Stratford-on-Avon and so on. You may want to edit the spreadsheet accordingly before importing into your database.

So, what do I do about it? Fortunately there is a relatively simple solution that doesn't involve a major redesign.

If you look at the short name forms in the spreadsheet and the downloadable results file you'll see they are the same. More importantly, they are unique. Consequently, the BFShortName field in the VENUE table can be used as the table key and the VenueID field can be demoted to a normal field containing the code needed to programmatically build the URL to Timeform.

This means that the Great Leighs/Chelmsford City schizophrenia is taken care of by having two entries each with the same VenueID and it turns the Newmarket issue into a minor anomaly where creating its URL for Timeform will present a little, not insurmountable problem.

I won't present an updated ERD at this point just in case I come across other issues with the data that force a change on me. Instead, I'll hold back on that until I've created all the tables and I'm satisfied there aren't any more issues.

No comments: