Racing Database Redesign - Part 12 - Data Import Considerations

As I am using MySQL to host my database and will be automating it with PHP (or Python) scripts much of the work I have to do now will be somewhat specific and therefore only of interest to those employing the same technologies. With this in mind, I'll try and keep what future posts I make somewhat more general in nature. I certainly don't intend to post screeds of PHP scripts on here.

I'm assuming you've created your tables and are keen to put them into action but before you do, there are a number of things you need to be aware of:

1. The data contained within the downloadable files needs to be split up into the corresponding tables. However, that splitting must be done in a particular order. Specifically, any data that belongs in a table on the 'one' side of a one-to-many relationship must be entered first.

(i) To this end, the VENUE table must contain an entry corresponding to the 'BFShortName' before the event information can be entered into the EVENT table. Fortunately the VENUE table can be populated in advance by referring to the spreadsheet I introduced to you previously.
(ii) The event information must be entered into the EVENT table before all the individual horse data for each event can be entered into the EVENTDETAILS table.
(iii) The same applies to the HORSE table though the JOCKEY and TRAINER tables are OK assuming you've specified that the foreign keys 'JockeyID' and 'TrainerID' in the EVENTDETAILS table can be NULL.

2. The data from the Win market must be entered into the EVENTDETAILS table before you insert the corresponding Place market data.

3. Make sure you are familiar with the downloadable files and how they are structured. Specifically:

(i) The EVENT_DT field is formatted as a string like this - "DD-MM-YYYY HH:MM". If you are using MySQL it will expect the format to be "YYYY-MM-DD HH:MM". This may be the case for other database management software as well.
(ii) The MENU_HINT field contains the course name in short form that is required by the 'BFShortName' field in the database. It also contains the country code at the beginning and the date at the end. Both these items need to be stripped away to leave the short form course name.
(iii) You'll see figures for BSP, PPWAP etc are specified to a large number of significant places. There's obviously no need to go to that level and 2-3 decimal places are fine.
(iv) The first item on each row of the data file is the EVENT_ID. Note that the data contained is not necessarily in order nor are the runners necessarily grouped together by EVENT_ID.

Once your database is up and running, it is perfectly feasible to manually import the previous day's data by manually downloading the file, modify it as required above if necessary and import it using whatever means your database software supports. It's a rather time consuming process but it is possible.

However, with well over 4500 win and place market data files for the UK alone - at the time of writing - stretching back to 2008, doing this manually is clearly impractical.

Obviously the sensible thing to do is create some automation that will (at least initially) download, edit and import all the results from the previous day's racing. Once that is fully working it should be a trivial matter to expand it to loop through all the downloadable files and install the data in one fell swoop.

That's what I'll be looking at next time around and pointing to the things you need to look out for when doing so.

Back Or Lay Picks 28th-30th April

The season is slowly drawing to a close but note, these picks don't. I continue during the summer with the Scandinavian leagues though the number of picks will be somewhat less.


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Racing Database Redesign - Part 11 - Design And Admin Software

Just a quick note about design and admin software you can use on your project.

All the modelling I've talked about so far does not require any use of design software. Indeed, you don't need to go anywhere near your database application until you have to physically create the tables.

If you are using MS Access as your database, you can graphically create your relationship between tables and the software will prompt you accordingly for various settings. However, the tables have to be created beforehand - or they did in the last version of Access I used in anger. So you would have to do the modelling up front before creating the tables. You can of course invest in ERD design software but for small projects like this, it is unnecessary.

If you are using a MySQL server based database system like I am, particularly hosted on a Linux box, you have a number of options available to you. Note that MySQL is also available for Windows servers:

1. Webmin - A web based Linux server administration tool that lives on your server which comes with a MySQL administration module. It's primary role is for administering the server rather than creating databases but it does allow basic tasks such as creating/deleting tables, running SQL queries and the like. There is no ERD tool however.

2. phpMyAdmin - Another web based tool that lives on your server (Linux or Windows) but this is specifically designed for MySQL server admin and database creation. As such, it is much more sophisticated and offers complete control over your database design process, including an ERD module in which relationships can be graphically created and configured.

3. MySQL Workbench - A cross-platform client administration and database design tool giving you complete control over your MySQL server and database design process. The modelling feature is particularly useful in that it allows the design of the database to be done, including ERD, without the need to create the tables (the table fields and settings still have to be defined as part of the process). Once you are happy with the design the tables can be automatically created on the server with just a few mouse clicks. It's particularly useful for designing large databases over multiple servers.

While I like MySQL Workbench, I use phpMyAdmin as my tool of choice. Not only am I more familiar with it, but given the scale of databases I tent to create it suits my purposes perfectly. Also, because it is web based, hosted on my server, I can access it from any device without having to worry about installing software on each client.

The image above was created using phpMyAdmin's designer tool. You'll notice that there's a few additions to the EVENT table from last time. This is to allow me to analyse some of the in-running detail on a race-by-race basis. More of that nearer the time.

Back Or Lay Picks 25th-26th April

An unusually quiet weekend this time around...


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Racing Database Redesign - Part 10 - Tables (Update)

So much for taking some time out from blogging to create the database! As soon as I started I noticed an issue with the Betfair data that would upset things slightly.

The VENUE table is, for all intents and purposes, pretty static in that the data it contains isn't going to change very much if at all. As a result, this table can be easily pre-populated as part of the design process using the information available from TimeForm and the spreadsheet file I flagged up in my last post.

If you look back to the ERD, you'll see that the VenueID is the key field. This identifier is not directly available from the downloadable Betfair results files but I wanted to use that as the key (i) because I assumed it was unique and (ii) I could use it within any future interface to call up the TimeForm race card by programically creating the URL.

The image below illustrates why I intended to use it. You'll see the CourseID of 1.9 in the URL which identifies Catterick Bridge.

Each course will have a corresponding CourseID and it would be a trivial process to retrieve that from the VENUE table and build the corresponding URL.

However, all is not as straight forward as it seems. There are a number of issues I discovered when populating the aforementioned spreadsheet file with these identifiers in preparation for importing to the database:

1. Timeform treats Newmarket as two separate courses, namely Newmarket (Rowley) and Newmarket (July) with CourseIDs of 1.36 and 1.60 respectively yet the downloadable results files, the aforementioned spreadsheet and as far as I can make out, API software all treat them as one using the short form name of 'Newm'. This then makes it difficult to marry the results files to the correct venue in the case of Newmarket.

2. After the shortlived Great Leighs course arose phoenix-like from the ashes with the new name of Chelmsford City, Timeform simply reassigned the same CourseID to the name of Chelmsford City. A reasonable thing to do you might think except, if you change the dates on the Timeform site to 8th Jan 2009 you'll see that a meeting that was held at Great Leighs was apparently held at Chelmsford City - long before such an entity existed. You and I know that they are one and the same location, but software doesn't.

While this may not necessarily be an issue in and of itself, the historical results files for the time that Great Leighs was operational refer to it's orginal name not its reincarnation. If this issue isn't taken care of, the importation of all those races at Great Leighs will fail.

3. It is not an operational issue but you'll also find the venue long name in the spreadsheet file differs from the long name used by Timeform. For example, Epsom v Epsom Downs; Fontwell v Fontwell Park; Stratford v Stratford-on-Avon and so on. You may want to edit the spreadsheet accordingly before importing into your database.

So, what do I do about it? Fortunately there is a relatively simple solution that doesn't involve a major redesign.

If you look at the short name forms in the spreadsheet and the downloadable results file you'll see they are the same. More importantly, they are unique. Consequently, the BFShortName field in the VENUE table can be used as the table key and the VenueID field can be demoted to a normal field containing the code needed to programmatically build the URL to Timeform.

This means that the Great Leighs/Chelmsford City schizophrenia is taken care of by having two entries each with the same VenueID and it turns the Newmarket issue into a minor anomaly where creating its URL for Timeform will present a little, not insurmountable problem.

I won't present an updated ERD at this point just in case I come across other issues with the data that force a change on me. Instead, I'll hold back on that until I've created all the tables and I'm satisfied there aren't any more issues.

Racing Database Redesign - Part 9 - Tables (Final Design?)

The table design is drawing to a conclusion now, but there's one more important thing to do before reviewing the final design for any omissions or errors.

When setting up a database, it is not enough to establish the tables and the relationships between them. One must also consider the data that is being stored; what 'type' it is and what 'size' it is. This is important as it helps to ensure that the amount of storage space required is minimised and only the correct type of information is stored in the corresponding fields. To illustrate this point, let me take a simple example...

Imagine you create a table field that is intended to hold the names of Jockeys. As you don't know how big peoples' names may be you create that field so that it can hold 200 characters. As your table fills up over time you may find that the maximum length of any Jockey's name is 25 characters. This means that that one field in your database is wasting 175 characters worth of data storage for EVERY record in that table.

If you've done the same type of thing in all fields, over all tables throughout the database I'm sure you'll see that that wasted space can mount up very quickly.

Also, specifying the type of data each field will hold helps to eliminate data input errors. So, if a field should only contain numbers any attempt to enter text will result in an error.

For a full list of data types available reference should be made to the specific database software you are using. In any event, you can use the ERD to document what data types and settings are required for each and every field within your database.

I use the following format:

datatype(size if applicable), NULL [yes|no], DefaultValue

So, using the EventID field as an example, its datatype is set to 'int' (that's an integer) with a size of 20 characters; NULL is set to 'no' - in otherwords it cannot be empty; and finally there is no default value set - in otherwords its value must be specified manually or programmatically.

The final ERD below lists all the settings required, though some will need further explanation.

Generally speaking, it is best to make sure each field cannot be NULL, i.e. empty. This is not possible with this database simply due to the nature of the data that is being recorded and where it is coming from. So you'll see all of the tables except EVENTDETAIL have at least one field specified to allow a NULL value. Unfortunate, but it cannot really be helped in this case - at least not until I can find sources for that historical information.

Also note the use of the 'varchar' data type. I'm sure MS Access has a similar data type but I cannot remember what it is called. When using MySQL as your database you will find that indexing fields that contain text performs much quicker if the field is configured as a 'varchar' rather than 'text'. I'm not privy to the technical reasons this is so though I'm sure there will be some explanation on the Internet for those who are interested. Suffice to say, if there's any chance you may wish to perform any form of querying on a text field, configure it as a 'varchar'.

I should flag up one important point that you need to consider with regard to the data types that you specify:

Virtually all of this data is being provided by a third party, namely Betfair. As such, neither you or I have control over that data. Currently, Betfair provide internal IDs such as EventID as a numerical field. Though unlikely, there is nothing to stop them changing that to an alphanumeric field for example. In that event, any attempt to store such EventIDs into the current design would fail.

I don't think Betfair would implement such changes, but I feel it is important that you are aware of such a possibility. In other words, be aware of where your data is coming from, how much control you have over it and the potential impact it may have on your design.

Table and Design Review:

At this point it is worth doing a final review of the design to ascertain if everything is covered or indeed if anything has been missed not only from the design but the original specification. In no particular order...

1. You should note from the ERD I've changed my mind about having separate date and time fields in the EVENT table. I'm not sure why I suggested it in the first place as, when I checked my old database, I had indeed stored them as one in 'datetime' format.

2. Note that I've included two fields in the VENUE table that ostensibly hold the same information, namely Name and BFShortName. The Name field will hold the full name of the venue whereas BFShortName will hold the abbreviated venue name as contained within the downloadable BF results files as well as the menus you might have seen in trading software. A full list of these pairings can be downloaded from Betfair as an MS Excel file. Using this file, the VENUE table can be pre-populated though you still need to find the corresponding VenueID values.

3. Although the place market downloadable result files indicate whether a runner was placed they do not indicate its finishing position. Consequently there is no way to ascertain a horse's past form figures from those files. The form figures are available from the Betfair API or numerous websites and the best place to place that information would be in a text field stored in the HORSE table. This field would need updating every time the horse ran.

4. A general rule of thumb within database design is this: if something can be calculated from existing data then there is no need to store that result within the database. Depending on the complexity of the calculation required, this can increase processing time for every executed query. If storing that info upfront reduces the overall processing time then it may be worth ignoring this general rule.

Take for example the well known Lay The Field strategy. As an exercise, consider how you would go about analysing the historical record using the ERD design above in order to ascertain which venues, race distances, field sizes, race types, etc were statistically good for implementing the LTF strategy - answers/suggestions in the comment section please.

5. It is worthwhile reviewing other sources of information or websites to assess whether other suitable information exists that might be usefully included. For example, at the bottom of every card on the Timeform site is listed the 'Betfair betting forecast' - a list of projected BSP for each runner. Could that be used to develop any future strategy? If you do not record it you'll never be able to analyse the data so you'll never know.

6. Note that the design is generic. While I am principally focussing on UK horse racing, you'll see that the downloadable datafiles all have the same format. Consequently, races from Australia, Ireland, South Africa and USA could be included. As an added bonus, UK greyhound races (granted jockey info would be irrelevant here) could also be recorded within the same database - or more suitably a database with similar structure. Don't say I don't give you value for money!

As you can see from the above points, not only can a final review flag up any omissions but it can lead to expansion beyond the remit of the original specification. The normalisation process which I discussed in an earlier post helps to facilitate this but the database designer must be able to draw the line somewhere and for me, this is it. I'll make a few tweaks as outlined in the review above but the design is now essentially finished.

I hope you've found this series of some use and feel able to have a go yourself. I'm going to take a break for a couple of weeks as I start to implement the design. I'll return with a few posts to discuss some of the automation side of things. In the meantime, let me wish you good luck with your endeavours.

Back Or Lay Picks 21st-22nd April

Back to Milton Keynes for the single selection from this week's midweek games. Here's hoping they do a little worse than their 6-1 drubbing of Leyton Orient at the weekend.


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Back Or Lay Picks 18th-19th April

Going through a little flat spot at the moment so here's hoping this lot will pick things up again. Currently 20.3 points in profit since the beginning of August.


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Racing Database Redesign - Part 8 - Tables (continued)

The main tables are essentially finished in terms of providing a model for a horse racing database. However it is not quite there yet as far as recording the results data from Betfair - the main reason for doing the thing in the first place.

The data files located on Betfair are provided as comma delimited files (csv files) which can be imported into a spreadsheet or, in my case, a database. Before that can be achieved it is necessary to take a look at their structure. The above link will take you to a page listing all the downloadable files going back to late 2008. It also has a table describing the file contents.

For your conveinience and with thanks to Betfair, I've reproduced the table below. I've also taken the liberty of including an extra column containing the data of one record from one of those files to which I will refer during the following discussion.

EVENT_IDBetfair internal event ID118128024
MENU_HINTContains the name of the race meetingGB / Aint 9th Apr
EVENT_NAMEThe race name as it appears on the Betfair menu - place markets are always 'To Be Placed'2m4f Grd1 Nov Chs
EVENT_DTThe start time of the race09-04-2015 13:40
SELECTION_IDBetfair internal selection ID5310463
SELECTION_NAMEThe name of the runnerCash And Go
WIN_LOSE1 if the runner won (or placed), 0 if it lost (or was unplaced)0
BSPBetfair Starting Price22
PPWAPWeighted Average Price of all bets placed pre-off20.4506469383
MORNINGWAPWeighted Average Price of all bets placed before 11am GMT19.8992724833
PPMAXMaximum price placed before the off (only bets of a payout of more than GBP100 included)23
PPMINMinimum price placed before the off (only bets of a payout of more than GBP100 included)17.5
IPMAXMaximum price placed in-play (only bets of a payout of more than GBP100 included)1000
IPMINMinimum price placed in-play (only bets of a payout of more than GBP100 included)19
MORNINGTRADEDVOLAmount traded before 11am GMT8902.98
PPTRADEDVOLAmount traded before the off42758.9
IPTRADEDVOLAmount traded in-play3419.04

The first four fields obviously relate to what will be stored in the EVENT and VENUE tables within the database. However, with the exception of the EVENT_ID field none of these are in the form required by the database:

MENU_HINT - Needs the country and date information removed to leave just the venue name.
EVENT_NAME - Contains both the race distance and race type. These need to be separated and placed within the corresponding fields of the EVENT table.
EVENT_DT - Contains both the date and time of the race which need to be separated and placed within the corresponding fields of the EVENT table. (Note that it is perfectly feasible to keep this data in the one field because databases essentially treat date and time data in the same way. This can involve extra work on the programming side when querying data so I prefer to split them up front. This is a matter of personal choice).

SELECTION_ID and SELECTION_NAME belong in the HORSE table but what about the rest? At first glance there doesn't seem to be a table for this information.

This is were we make use of the joining table EVENTDETAIL. Generally speaking, joining tables only consist of foreign keys, but as I alluded to in earlier posts, there is nothing preventing them from also acting as normal tables in their own right. Consequently, as these other fields are directly related to each runners Betfair performance in each race, the EVENTDETAIL table is the sensible place to put them.

As an aside, all the fields relating to traded volume and weighted average price can be ommitted as it wasn't part of the original specification. I cannot at the moment visualise how it can be used. However, I'm going to include those fields to give me the opportunity to carry out some analysis to see if there are any angles for new strategies.

That covers the information for each win market, but I also want to record some of the place market results. I'm not interested in the volume traded figures for the place markets, nor do I have any need for the max and min prices. Consequently, I'll be limiting place market information to the WIN_LOSE and BSP fields.

Note that the data files provide no direct link between the EVENT_ID for a win market and the corresponding EVENT_ID in the place market. This would need to be ascertained programatically by comparing the MENU_HINT and EVENT_DT information keeping in mind that occasionally there may be no corresponding place market available.

So, with all that in place, I can now present you with the ERD containing all the table fields to date:

There's still a little work to do with regard to the table fields. Specifically, the data types and sizes still have to be specified. In addition, it is worthwhile reviewing the design and identifying anything that may be missing. I'll do all that in the next installment.

Back Or Lay Picks 14th-16th Apr

The weekend proved to be something of a non-event with a tiny 0.07pt loss so it's now the turn of the mid-week games to pick things up a bit...


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Back Or Lay Picks 10th-12th Apr

Only a handful for this weekend but hopefully these will do enough to recover the small loss from midweek matches.


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Racing Database Redesign - Part 7 - Tables (continued)

Having identified issues with the initial ERD model it's now time to present you with my preferred option. As you'll see from the ERD below, this new arrangement reduces the number of tables required to six including the need for only one joining table. How was this possible?

If one considers the previous model, one would note that the JOCKEY, HORSE and TRAINER tables were all involved in many-to-many relationships with the EVENT table through corresponding joining tables. The data contained within those tables are inextricably linked if one views that data from the point of view of each individual event. Consequently, it is not unreasonable to consolidate those joining tables into one joining table which I've chosen to call EVENTDETAIL.

The interesting benefit of this arrangement is the elimination of the need to have a joining table between JOCKEY and HORSE tables (and the historical partnership issue that came with it). That partnership is now automatically maintained within the EVENTDETAIL table - as is the trainer/horse pairings as well. They are now both directly tied to the individual event concerned. Consequently, not only is the number of tables within the database reduced but the complexity of some of the querying that would need to be performed to establish those partnerships is also eliminated.

A number of points about the ERD before discussing it in more detail:

1. In line with what I was discussing last time, I've renamed the COURSE table as VENUE. Not only did it seem sensible but there's another reason as you'll see in a later post. You can of course call your tables whatever you like but it makes sense to call them what they represent.
2. I've started to finalise the design by incorporating the non-key field names for each table.
3. In the EVENTDETAIL table, I've chosen to follow strick procedure and show a joint-key between all four foreign key fields, i.e. EventID, HorseID, JockeyID and TrainerID. In fact only the first two are required to uniquely identify each record within that table, as discussed below.

While the above table arrangement represents what I want to model, there's always going to be an issue with historical data, specifically for trainers and jockeys. As I said earlier in this process, unless I can find a source from where I can pull that information in en masse then I'm not that concerned with modelling that aspect no matter how nice it would be to have it.

As I ultimately want to be able to use this database to guide my selection processes on the day of a race, that day's information can be retrieved programatically either through the API or web scraping techniques. It is purely the historical information that may cause issues. However, the absence of such historical data may cause a problem where the JockeyID and TrainerID fields within the EVENTDETAIL table can be empty.

Any key field within a table cannot be empty. However, those fields in their role within the EVENTDETAIL table are NOT key fields in and of themselves. As a result, depending on the database software you are using, you may find that individual entries within that field can indeed be empty. Keep in mind that having empty fields within a database still takes up storage space which is one reason why they should be avoided.

If you find that your database software complains if those fields are empty, you have a couple of options - well three if you decide to scrap the idea of recording trainer and jockey info.

1. Betfair assigns unique numerical codes to these items. These are usually six or more digits. Within each of the TRAINER and JOCKEY tables you could assign a dummy entry with the ID number of -1 say and the name of 'NA' and use that to represent all of the historical records that you are unable to get. As you build up those parent tables, the dummy entry will be used less and less.
2. Since the EventID and HorseID are enough of a joint key to uniquely define each record within the EVENTDETAIL table, one could simply remove the other two as being members of that joint key. Note that they still need to be contained within the table. This would allow the TrainerID and JockeyID fields to be empty.

I actually prefer option 1 or a combination of the two which would eliminate the possibility of those fields being empty in any event. As I said, check your database software as it may force you into one option or the other.

Next time, I'll look at finalising the table design by looking at what needs to be done to hold the Betfair race data.

Racing Database Redesign - Part 6 - Tables (continued)

I promised last time I'd introduce you to a second option; indeed, my preferred model. I'm going to renege on that promise for now as there are a couple of things I'd like to discuss in relation to the ERD I presented last time.

You may have seen the comment to the last post made by Dave regarding making a database '3nf'.

Dave was referring to the process of database 'table normalisation', specifically '3rd Normal Form'. As ever, you can find further information on my business website or feel free to google 'database normalisation'.

Suffice to say any database you design will be well served by getting your model into 3rd normal form. It ensures the integrity of the data contained within; eliminates repeating fields within tables; guarantees that each piece of data is stored in only one place; confirms tables only include data that belongs to the entity the table is describing and thereby makes accurate querying and data manipulation much, much easier.

Using Entity Relationship Diagrams as a technique not only forces you to think about the tables and relationships but you'll find that the design almost automatically falls into 3rd normal form. It certainly makes it easier to spot when it isn't.

Going back to the ERD presented last time, and not withstanding the issue with past jockey/horse partnerships, you'll find that the database is in third normal form. This leads me on to another important point about the normalisation process... It makes the expansion of your database beyond the remit of your original specification much easier without the need for a major redesign.

By way of example, at some later date I may decide I wanted to record further information about each track at each course. Off the top of my head: number of jumps; space between jumps; length of run in from last jump; is the course undulating or flat; is the run in up hill and so on. Or perhaps I might want to record information about a horse's breeding or detailed information about a trainer's yard.

To incorporate that data it is simply a matter of adding the additional tables required and creating the necessary relationships between them and the existing tables. The current data remains intact and no redesign is required to the existing structure.

Finally, let me revisit the jockey/horse partnership issue from the current model.

As indicated last time, the best way (I say best way in the context of the current ERD and the fact I can think of no other way within those confines) to handle it would be to treat it in the same way as the trainer/horse partnership and incorporate some date information within the joining table, HORSEJOCKEY. Those dates would reflect the date each partnership took place, i.e. the date of the race in question and therein lies the rub.

That date information is already contained within the EVENT table and incorporating it elsewhere would, while allowing the functionality I'm looking for, create a cardinal sin by storing the same data in multiple places within the database. This can lead to additional work in both programming the database as well as its operation but more importantly, it can lead to errors creeping in when recording and querying data. Every effort must be made to avoid that situation arising by eliminating the possibilty at the design stage. In otherwords, you should always ensure that data is recorded in one place and one place only.

That now leads me to the end of this little diversion and back on track. Next time I'll take a look at the revised ERD I've been promising you.

Back Or Lay Picks 6th-9th Apr

Sorry for this being a little late but you've still got time to follow the ones playing today...


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Stephen Lohoar - Showjumper

Currently competing at the Scottish National Equestrian Centre, where my daughter is working.

This ignorant fuck has blocked my daughter in with his horse lorry for over 6 hours. Despite being asked on numerous occasions to move it, he refuses to do so. He simply jumps on another horse and rides off again.

Unfortunately I cannot find a twitter account for him otherwise I would've contacted him directly.

Some people just cannot help being utter arseholes.

Racing Database Redesign - Part 5 - Tables (continued)

Last time I left you to mull over what might be wrong with the ERD I had left you with. In particular, I queried the inability to marry a jockey/horse partnership to the EVENT table.

Hopefully, you'll have spotted the big clue I left. The ERD as presented is incomplete in that there should be another relationship in there.

Just as a list of horses running in an event can be ascertained through the relationship between the HORSE and EVENT tables, a similar many-to-many relationship exists between the JOCKEY and EVENT tables. That is to say, any one jockey can ride in a number of events and any one event clearly has any number of jockeys taking part.

Once that is established, a query can be run on the HORSE - EVENT relationship to establish a list of horses; a query can be run on the JOCKEY - EVENT relationship to establish a list of jockeys; and finally, the jockey/horse partnerships can be established by comparing those two lists through a query on the JOCKEY - HORSE relationship.

Exactly the same situation exists with regard to the TRAINER table. A many-to-many relationship between the TRAINER and EVENT tables must be catered for.

Those changes are illustrated in the revised ERD below in which you'll notice two things:

1. The number of tables required has doubled due to the number of many-to-many relationships that need to be handled.
2. I've started to label each table with field names. For now, I'm limiting myself to listing 'Key Fields' and those involved in the creation of the relationships.

Again, I'd point you to the tutorials on my business website for more details but it's worthwhile giving a brief explanation at this point regarding 'Key Fields'.

Put simply, each record within a database table must be unique. It is the job of the table key to provide that functionality. If one looked down the key field (or column) in a table, no entry will ever be repeated.

In an ERD, I show the keys in bold type. Relationships are generally established between table fields containing the same data. To enable that to happen, the key field from the ONE side of the relationship is exported to the table on the MANY side of the relationship as a 'Foreign Key'. This name is perhaps a little misleading in that 'Foreign Key' is NOT a key field within the new table. You'll see the CourseID field in the EVENT table is not shown in bold but its presence is necessary to establish the relationship between the COURSE and EVENT tables.

Now, consider the joining tables used to model the various many-to-many relationships by taking the JOCKEYEVENT table as an example:

These have two fields both of which are embolden because AS A PAIR they make up a unique table key. Individually however, they are not unique. Each EventID will be repeated a certain number of times equal to the number of runners in that event. Each JockeyID will repeat equivalent to the number of races he/she has participated in to date. But taken together, there will be no repeating combinations.

Moving back to the model as it now stands, there is still a problem with it...

Imagine for a particular race we wanted to list the horses taking part together with trainers concerned. If a particular horse was competing in a race where its previous trainer also had a runner then any query done on the current model would list both those partnerships.

To get around that, a record of when a horse was transferred to a new trainer would have to be included in the database. The easiest thing to do would be to put that date information into the HORSETRAINER joining table. Any subsequent query would simply ask for the most recent combination in order to eliminate the old partnership from the list.

Whether that data is readily available may be an issue for the purposes of this database.

The same problem rears its ugly head elsewhere. When trying to list the jockey/horse combinations for a particular race it is quite feasible that a horse may be up against a previous jockey riding another horse. This one isn't quite so easily catered for as the horse/trainer example above.

In my last post I indicated that there were a couple of solutions to the initial model proposed. (There may well be others that I haven't considered). Next time, I'll introduce my preferred method that will both reduce the number of tables required and, once the database is operational, make querying the data easier.

In the meantime, if you have any questions or if you are finding my ramblings even a little bit useful then please leave a comment below.

Back Or Lay Picks 4th-5th April

A busy Easter weekend coming up but only four selections for consideration...


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Wading Through Treacle

As regular readers will know, all my horse race trading tends to be in-running but this new Betfair API continues to make that more and more difficult.

I've done three races today so far and it's only 14:37 as I type this. Every race has been incredibly sticky with Fairbot not refreshing for up to 10 secs. It is virtually impossible to trade when the API is behaving like this.

The new API has been with us for nearly six months and these issues have been reported to Betfair by loads of traders yet they seem incapable or unwilling to resolve them. It really isn't acceptable. I'm only a small player at this. I hate to think how the professionals are feeling.

Here's a short video I just recorded showing the 14:40 at Towcester on 1st April. April Fool's day indeed.

Racing Database Redesign - Part 4 - Tables (continued)

Let me start by saying if anyone has any questions or would like to exchange ideas with me or anyone else that is reading these tutorials then feel free to post a comment.

In part three of this series, I started to introduce the tables and discuss the relationship they have with each other. Hopefully you've taken the opportunity to look at the tutorials on my business website which will give you a basic insight into database table design.

The initial ERD I introduced last time is deliberately flawed so that I could illustrate the important link between the objectives and specification of the project; the data modelling presented within the table design and the internal relationships with the data being modelled.

If you recall, the 1st draft ERD indicates a number of many-to-many relationships within the design. This type of relationship is not unusual in a database but they need to be modelled differently because Relational Database Management Systems (RDMS) cannot handle them efficiently. The solution is quite simple however. Each many-to-many relationship can be modelled by splitting the relationship into two one-to-many relationships between the existing tables and a new intermediary table known as a 'joining' table.

The ERD diagram below represents the new arrangement where each joining table contains the key fields from corresponding tables that 'feed' it.

The presence of many-to-many relationships within the model as it stands is not an issue. I'm more interested in the relationships between jockeys, trainers and horses. First, ignore what you see in the above diagram and simply consider the relationship between jockeys and horses.

It is reasonable to assume a relationship between the two exists in that any one jockey can ride any number of horses in a career lasting many years and any one horse could be, over its career, ridden by a number of different jockeys. This is the very definition of a many-to-many relationship. Yet does this relationship fit in with the objectives of the project? How does one go about listing the jockey and horse partnership in any one event?

Clearly, for any given event, the list of horses can easily be obtained thanks to the relationship that exists between the EVENT table and the HORSE table. One might then think that the relationship between the HORSE table and the JOCKEY table gives us access to the jockey information. It does, in that it lists all those jockeys who have every ridden those horses. Unfortunately, because there is no direct link between the EVENT and JOCKEY tables it is impossible to assertain which jockey was riding which horse in any one particular event.

If you recall from the last post, the EVENT table consists of information about each race, namely Betfair's unique ID number (EventID) as well as date and time information. We need a way to relate the jockey table to that event information before jockey/horse partnerships for any particular event can be identified.

Now a very similar argument exits for the relationship between the HORSE and TRAINER and its resolution will use the same technique as that used between HORSE and JOCKEY.

Such is the way of things in the world of database modelling that there is not necessarily a single solution. In this case there are a couple of solutions that spring to mind. With that said, I'm going to finish this post here and give you the opportunity to mull over what those solutions might be and I'll address them next time.