Racing Database Redesign - Part 7 - Tables (continued)

Having identified issues with the initial ERD model it's now time to present you with my preferred option. As you'll see from the ERD below, this new arrangement reduces the number of tables required to six including the need for only one joining table. How was this possible?

If one considers the previous model, one would note that the JOCKEY, HORSE and TRAINER tables were all involved in many-to-many relationships with the EVENT table through corresponding joining tables. The data contained within those tables are inextricably linked if one views that data from the point of view of each individual event. Consequently, it is not unreasonable to consolidate those joining tables into one joining table which I've chosen to call EVENTDETAIL.

The interesting benefit of this arrangement is the elimination of the need to have a joining table between JOCKEY and HORSE tables (and the historical partnership issue that came with it). That partnership is now automatically maintained within the EVENTDETAIL table - as is the trainer/horse pairings as well. They are now both directly tied to the individual event concerned. Consequently, not only is the number of tables within the database reduced but the complexity of some of the querying that would need to be performed to establish those partnerships is also eliminated.


A number of points about the ERD before discussing it in more detail:

1. In line with what I was discussing last time, I've renamed the COURSE table as VENUE. Not only did it seem sensible but there's another reason as you'll see in a later post. You can of course call your tables whatever you like but it makes sense to call them what they represent.
2. I've started to finalise the design by incorporating the non-key field names for each table.
3. In the EVENTDETAIL table, I've chosen to follow strick procedure and show a joint-key between all four foreign key fields, i.e. EventID, HorseID, JockeyID and TrainerID. In fact only the first two are required to uniquely identify each record within that table, as discussed below.

While the above table arrangement represents what I want to model, there's always going to be an issue with historical data, specifically for trainers and jockeys. As I said earlier in this process, unless I can find a source from where I can pull that information in en masse then I'm not that concerned with modelling that aspect no matter how nice it would be to have it.

As I ultimately want to be able to use this database to guide my selection processes on the day of a race, that day's information can be retrieved programatically either through the API or web scraping techniques. It is purely the historical information that may cause issues. However, the absence of such historical data may cause a problem where the JockeyID and TrainerID fields within the EVENTDETAIL table can be empty.

Any key field within a table cannot be empty. However, those fields in their role within the EVENTDETAIL table are NOT key fields in and of themselves. As a result, depending on the database software you are using, you may find that individual entries within that field can indeed be empty. Keep in mind that having empty fields within a database still takes up storage space which is one reason why they should be avoided.

If you find that your database software complains if those fields are empty, you have a couple of options - well three if you decide to scrap the idea of recording trainer and jockey info.

1. Betfair assigns unique numerical codes to these items. These are usually six or more digits. Within each of the TRAINER and JOCKEY tables you could assign a dummy entry with the ID number of -1 say and the name of 'NA' and use that to represent all of the historical records that you are unable to get. As you build up those parent tables, the dummy entry will be used less and less.
2. Since the EventID and HorseID are enough of a joint key to uniquely define each record within the EVENTDETAIL table, one could simply remove the other two as being members of that joint key. Note that they still need to be contained within the table. This would allow the TrainerID and JockeyID fields to be empty.

I actually prefer option 1 or a combination of the two which would eliminate the possibility of those fields being empty in any event. As I said, check your database software as it may force you into one option or the other.

Next time, I'll look at finalising the table design by looking at what needs to be done to hold the Betfair race data.

No comments: