Racing Database Redesign - Part 12 - Data Import Considerations

As I am using MySQL to host my database and will be automating it with PHP (or Python) scripts much of the work I have to do now will be somewhat specific and therefore only of interest to those employing the same technologies. With this in mind, I'll try and keep what future posts I make somewhat more general in nature. I certainly don't intend to post screeds of PHP scripts on here.

I'm assuming you've created your tables and are keen to put them into action but before you do, there are a number of things you need to be aware of:

1. The data contained within the downloadable files needs to be split up into the corresponding tables. However, that splitting must be done in a particular order. Specifically, any data that belongs in a table on the 'one' side of a one-to-many relationship must be entered first.

(i) To this end, the VENUE table must contain an entry corresponding to the 'BFShortName' before the event information can be entered into the EVENT table. Fortunately the VENUE table can be populated in advance by referring to the spreadsheet I introduced to you previously.
(ii) The event information must be entered into the EVENT table before all the individual horse data for each event can be entered into the EVENTDETAILS table.
(iii) The same applies to the HORSE table though the JOCKEY and TRAINER tables are OK assuming you've specified that the foreign keys 'JockeyID' and 'TrainerID' in the EVENTDETAILS table can be NULL.

2. The data from the Win market must be entered into the EVENTDETAILS table before you insert the corresponding Place market data.

3. Make sure you are familiar with the downloadable files and how they are structured. Specifically:

(i) The EVENT_DT field is formatted as a string like this - "DD-MM-YYYY HH:MM". If you are using MySQL it will expect the format to be "YYYY-MM-DD HH:MM". This may be the case for other database management software as well.
(ii) The MENU_HINT field contains the course name in short form that is required by the 'BFShortName' field in the database. It also contains the country code at the beginning and the date at the end. Both these items need to be stripped away to leave the short form course name.
(iii) You'll see figures for BSP, PPWAP etc are specified to a large number of significant places. There's obviously no need to go to that level and 2-3 decimal places are fine.
(iv) The first item on each row of the data file is the EVENT_ID. Note that the data contained is not necessarily in order nor are the runners necessarily grouped together by EVENT_ID.

Once your database is up and running, it is perfectly feasible to manually import the previous day's data by manually downloading the file, modify it as required above if necessary and import it using whatever means your database software supports. It's a rather time consuming process but it is possible.

However, with well over 4500 win and place market data files for the UK alone - at the time of writing - stretching back to 2008, doing this manually is clearly impractical.

Obviously the sensible thing to do is create some automation that will (at least initially) download, edit and import all the results from the previous day's racing. Once that is fully working it should be a trivial matter to expand it to loop through all the downloadable files and install the data in one fell swoop.

That's what I'll be looking at next time around and pointing to the things you need to look out for when doing so.

No comments: