Racing Database Redesign - Part 3 - Tables

Before I get to the details of the database tables I should point out that I won't be discussing the technical aspects of database table design here. However, I will be using terms like Entity Relationship Diagram (ERD), one-to-many and many-to-many relationships, attributes and data types. If you wish to learn more about these things (and I humbly suggest you may find it useful not just here but for any other databases you may be designing) then let me point you to a series of tutorials I wrote on my business website.

Now that that is out of the way, what about the horse racing database?

In actual fact, the database is a very simple one and needs only a handful of tables to hold the information I want. Even then, it may not need everything described below. If you recall from the post where I outlined the specification there are some things I'd like to include but only if I can find an online resource that allows me to pull that information in en masse. Otherwise, I do not wish to waste my time with it. For now, I'll include it in the model and I'll flag up those areas that fall into this category.

Below is a list of table names and their corresponding attributes that describe each item held within the corresponding table. At the moment, there is no attempt to fit it into a Betfair paradigm; I'm simply establishing the tables, the data they hold and how they relate to each other.

1. HORSE: HorseID, Name, DoB, Colour, Sex, RunningStyle
2. COURSE: CourseID, Name, Location, LayoutURL
3. EVENT: EventID, Date, Time, RaceDistance, RaceType, Going
4. JOCKEY: JockeyID, Name, DoB, Sex
5. TRAINER: TrainerID, Name, Location

Those are the main tables. There will be one or two others which I'll introduce and explain at a later date. They'll be used as temporary tables on a day-by-day basis and are not necessary to define the database itself.

Of those five tables, the last two are the most problematic. I've yet to find an online resource that will allow me to build the contents of these tables quickly and automatically as well as relate them to the corresponding Betfair ID numbers. It may be that the API is my best bet.

The attributes from those tables and some of the others are available from the timeform website, so that could be scraped each day. The feasibility of doing that for historical data going back to 2009 is open to question. I've still to research the best way to obtain the historical data that isn't included in the results files downloadable from Betfair - assuming that it is possible. The data obviously exists, but whether Joe Public can get at it is another matter.

The first draft Entity Relationship Diagram below shows the relationship between all five tables. The first thing to note is the three many-to-many relationships. These cannot be handled easily within a relational database model.

If you take the time to follow the database tutorials on my business website, and then refer back here, see if you can work out why I've created the relationships in this way.

This won't be the final solution - far from it. Over and above the many-to-many relationship issues outlined above there are additional problems with this initial model which I'll discuss in subsequent posts.

No comments: