Racing Database Redesign - Part 8 - Tables (continued)

The main tables are essentially finished in terms of providing a model for a horse racing database. However it is not quite there yet as far as recording the results data from Betfair - the main reason for doing the thing in the first place.

The data files located on Betfair are provided as comma delimited files (csv files) which can be imported into a spreadsheet or, in my case, a database. Before that can be achieved it is necessary to take a look at their structure. The above link will take you to a page listing all the downloadable files going back to late 2008. It also has a table describing the file contents.

For your conveinience and with thanks to Betfair, I've reproduced the table below. I've also taken the liberty of including an extra column containing the data of one record from one of those files to which I will refer during the following discussion.

FIELDDESCRIPTIONEXAMPLE
EVENT_IDBetfair internal event ID118128024
MENU_HINTContains the name of the race meetingGB / Aint 9th Apr
EVENT_NAMEThe race name as it appears on the Betfair menu - place markets are always 'To Be Placed'2m4f Grd1 Nov Chs
EVENT_DTThe start time of the race09-04-2015 13:40
SELECTION_IDBetfair internal selection ID5310463
SELECTION_NAMEThe name of the runnerCash And Go
WIN_LOSE1 if the runner won (or placed), 0 if it lost (or was unplaced)0
BSPBetfair Starting Price22
PPWAPWeighted Average Price of all bets placed pre-off20.4506469383
MORNINGWAPWeighted Average Price of all bets placed before 11am GMT19.8992724833
PPMAXMaximum price placed before the off (only bets of a payout of more than GBP100 included)23
PPMINMinimum price placed before the off (only bets of a payout of more than GBP100 included)17.5
IPMAXMaximum price placed in-play (only bets of a payout of more than GBP100 included)1000
IPMINMinimum price placed in-play (only bets of a payout of more than GBP100 included)19
MORNINGTRADEDVOLAmount traded before 11am GMT8902.98
PPTRADEDVOLAmount traded before the off42758.9
IPTRADEDVOLAmount traded in-play3419.04

The first four fields obviously relate to what will be stored in the EVENT and VENUE tables within the database. However, with the exception of the EVENT_ID field none of these are in the form required by the database:

MENU_HINT - Needs the country and date information removed to leave just the venue name.
EVENT_NAME - Contains both the race distance and race type. These need to be separated and placed within the corresponding fields of the EVENT table.
EVENT_DT - Contains both the date and time of the race which need to be separated and placed within the corresponding fields of the EVENT table. (Note that it is perfectly feasible to keep this data in the one field because databases essentially treat date and time data in the same way. This can involve extra work on the programming side when querying data so I prefer to split them up front. This is a matter of personal choice).

SELECTION_ID and SELECTION_NAME belong in the HORSE table but what about the rest? At first glance there doesn't seem to be a table for this information.

This is were we make use of the joining table EVENTDETAIL. Generally speaking, joining tables only consist of foreign keys, but as I alluded to in earlier posts, there is nothing preventing them from also acting as normal tables in their own right. Consequently, as these other fields are directly related to each runners Betfair performance in each race, the EVENTDETAIL table is the sensible place to put them.

As an aside, all the fields relating to traded volume and weighted average price can be ommitted as it wasn't part of the original specification. I cannot at the moment visualise how it can be used. However, I'm going to include those fields to give me the opportunity to carry out some analysis to see if there are any angles for new strategies.

That covers the information for each win market, but I also want to record some of the place market results. I'm not interested in the volume traded figures for the place markets, nor do I have any need for the max and min prices. Consequently, I'll be limiting place market information to the WIN_LOSE and BSP fields.

Note that the data files provide no direct link between the EVENT_ID for a win market and the corresponding EVENT_ID in the place market. This would need to be ascertained programatically by comparing the MENU_HINT and EVENT_DT information keeping in mind that occasionally there may be no corresponding place market available.

So, with all that in place, I can now present you with the ERD containing all the table fields to date:


There's still a little work to do with regard to the table fields. Specifically, the data types and sizes still have to be specified. In addition, it is worthwhile reviewing the design and identifying anything that may be missing. I'll do all that in the next installment.

2 comments:

Dave said...

Excellent, didn't realise Betfair let you get that amount of data out.

Alistair said...

Hi Dave,

Yes the csv files they freely provide are extremely useful. I wish they'd provide all the jockey and trainer info I'll be looking for as well.

Cannot have everything I suppose.