Racing Database Redesign - Part 3 - Tables

Before I get to the details of the database tables I should point out that I won't be discussing the technical aspects of database table design here. However, I will be using terms like Entity Relationship Diagram (ERD), one-to-many and many-to-many relationships, attributes and data types. If you wish to learn more about these things (and I humbly suggest you may find it useful not just here but for any other databases you may be designing) then let me point you to a series of tutorials I wrote on my business website.

Now that that is out of the way, what about the horse racing database?

In actual fact, the database is a very simple one and needs only a handful of tables to hold the information I want. Even then, it may not need everything described below. If you recall from the post where I outlined the specification there are some things I'd like to include but only if I can find an online resource that allows me to pull that information in en masse. Otherwise, I do not wish to waste my time with it. For now, I'll include it in the model and I'll flag up those areas that fall into this category.

Below is a list of table names and their corresponding attributes that describe each item held within the corresponding table. At the moment, there is no attempt to fit it into a Betfair paradigm; I'm simply establishing the tables, the data they hold and how they relate to each other.

1. HORSE: HorseID, Name, DoB, Colour, Sex, RunningStyle
2. COURSE: CourseID, Name, Location, LayoutURL
3. EVENT: EventID, Date, Time, RaceDistance, RaceType, Going
4. JOCKEY: JockeyID, Name, DoB, Sex
5. TRAINER: TrainerID, Name, Location

Those are the main tables. There will be one or two others which I'll introduce and explain at a later date. They'll be used as temporary tables on a day-by-day basis and are not necessary to define the database itself.

Of those five tables, the last two are the most problematic. I've yet to find an online resource that will allow me to build the contents of these tables quickly and automatically as well as relate them to the corresponding Betfair ID numbers. It may be that the API is my best bet.

The attributes from those tables and some of the others are available from the timeform website, so that could be scraped each day. The feasibility of doing that for historical data going back to 2009 is open to question. I've still to research the best way to obtain the historical data that isn't included in the results files downloadable from Betfair - assuming that it is possible. The data obviously exists, but whether Joe Public can get at it is another matter.

The first draft Entity Relationship Diagram below shows the relationship between all five tables. The first thing to note is the three many-to-many relationships. These cannot be handled easily within a relational database model.


If you take the time to follow the database tutorials on my business website, and then refer back here, see if you can work out why I've created the relationships in this way.

This won't be the final solution - far from it. Over and above the many-to-many relationship issues outlined above there are additional problems with this initial model which I'll discuss in subsequent posts.

Racing Database Redesign - Part 2 - Implementation

Before I get on to talking about the actual database design (which will appear in subsequent posts) I wanted to expand on the technologies I'll be using to create the database and it's functionality.

I'm guessing many of those following this - assuming anyone is following this - will be interested in duplicating the database within a desktop database program such as MS Access. Much of what I'll be doing can be applied to any desktop database design application but I won't be doing so myself.

Although I have a lot of experience in designing MS Access databases and indeed, training others how to do so, I no longer use MS Access at home. I converted my systems to Linux many years ago and have no need to use MS Access, good though it is.

I'll be using a server based database system called MySQL. This is open source software and freely available to anyone. It is used by many large organisations as it is stable, fast, and very scalable. Granted, its enterprise level functionality is somewhat overkill for this small project but there are a number of reasons for going with it other than familiarity:

1. There's a number of things that need to be done automatically without user intervention from me. As my Linux server is on 24/7, it makes sense to use a server based database management system.
2. It will easily handle the number of records and data that I'm likely to through at it.
3. It integrates easily with a variety of programming languages and web technologies.
4. It has its own powerful database language in SQL which makes the process of storing, manipulating and querying of data a breeze.
5. It's easy to backup and transfer to other systems if need be.
6. As it is freely available with many web hosting providers if I ever decide to make it generally available as a service, it would be a relatively straight forward process of transferring that design to the Internet.


The second crucial element in the implementation of this project is the automation.

When the old Betfair API was functional my database used PHP scripts to provide a whole range of functionality from downloading the previous day's results from https://promo.betfair.com/betfairsp/prices to analysing various strategies and placing bets in the market.

Those scripts that use the old API will no longer function and, if I was honest, I don't relish having to relearn much of PHP to fit in with the new API. PHP is such a cluncky language - especially for a non-programmer like me - that I'm sorely tempted to redo everything in Python. That means learning something new - which will be fun.

However, in order to get much of the functionality up and running as quickly as possible, the intention is to reuse the existing PHP scripts or modified version thereof which will give me time to learn what I need to learn of Python.

So in summary, the whole system will be hosted on an old Dell Dimension 5100 desktop with 4GB RAM and 250GB harddisk drive (The data going back to 2009 will be around 100MB if that).

It will be configured as a LAMP stack (Linux, Apache web server, MySQL database server, PHP/Python scripting language) in addition to its current role as a file and music server.

Next time, I get to the nitty-gritty of discussing the tables and how/what to store in the database.

Back Or Lay Picks 27th-28th Mar

A very quiet weekend due to the numerous international matches.

ROCHDALE
SHEFF. UTD
RANGERS
ALBION


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

What On Earth I'm I Doing?

Not something I'd normally be interested in watching far less have a bet on.

The Eurovision song contest is looming and I've had a very strong suggestion from a member of the SoccerLotto forum that Poland are strong candidates if not to win then an E/W possibility.

I understand the poster has done quite well out of the competition in the past so a cheeky fiver at odds of 600.0 won't go amiss. If they get past the semi-finals then the odds should come down a fair bit at which point I may start laying little bits off.

Racing Database Redesign - Part 1 - The Objectives

Although this is a new project, I already have much of the database design in place, together with a lot of the automation. However, it was created sometime ago and I’ve since altered my requirements inspired by additional strategies I have developed in the meantime. In addition, much of the automation relied on calls to the old Betfair API which is no longer operational. It was the decommissioning of that API which has ultimately led to the desire to completely revamp things.

With this in mind, I want to take the opportunity to tweak the database design somewhat so that it fits in more with the way I’m thinking now. I’ve still to decide how to automate much of the functionality - do I throw myself into learning the new API or do I scrape the Timeform site (among others). There’s pros and cons for both methods which I may discuss in a later article.  For the moment, I simply want to document the objectives together with any optional extras as well as identify any potential issues.

Objectives - In no particular order:

1. The principal aim is to record the horse racing data freely available from Betfair at https://promo.betfair.com/betfairsp/prices into a more suitable format to allow ease of data input and analysis with a view to generating new strategies and automatically suggesting suitable strategies to use on a race by race basis.

2. Automate the data gathering process so that data entry is kept to a minimum if not eliminated completely.

3. Allow for expansion of the database to include data not provided in the aforementioned Betfair data files. This might include information about jockeys and trainers as well as the horses themselves. This need only be implemented if a suitable source of historical data can be found that would allow its automated importation. Manually inputting such data for thousands of races going back to 2009 is clearly impractical.

4. It should be possible to analyse the data to investigate long term suitability of any number of strategies both known and unknown. Existing strategies such as DOBbing, LTF, Back to Lay, Dutching and scalping all spring to mind.

5. Although trading in the place market is not a requirement, accommodation for the place market data from the Betfair website should be made. Specifically, whether a runner was placed or not in a particular race should be recorded.

6. A web based interface is required to reduce the amount of time needed to manually assess a race card. It should be capable of: displaying the runner graphs from the Betfair site on a race by race basis (similar to the existing graphing shown here); providing a summary of the runners in a race with a view to suggesting the best strategy to use for each race; providing a daily summary of all runners with a view to identifying possible long term swing trades.

7. The database and associated interface is not intended to be a betting bot, automated or otherwise. However, the design should not prevent that possibility in the future.

8. The database and associated interface should be cross-platform and accessible from any device available. Indeed, the implementation should not dismiss the possibility of accessing the system remotely over the Internet or exclude the possibility of making the system available to the general public as an Internet based service.

9. The system should automatically download and import all results on a daily basis without user interaction though there should still be the facility to have that process manually initiated.

Those are the main areas I want the system to be capable of. No doubt others may well spring to mind but I imagine they will be relatively minor.

Before I go on to discussing the design of the database itself, I’ll be mulling over how I intend to implement the system to meet the aforementioned objectives. I shall outline those thoughts in the next instalment so stay tuned for that.

Back Or Lay Picks 20th-22nd Mar

A relatively quiet weekend this time around...

BRENTFORD
WATFORD
BRAINTREE
LYON
EMPOLI
LAZIO
B. MUNICH


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Back Or Lay Picks 17th-18th Mar

Four selections from the midweek offerings for you...

BRISTOL C.
NORTHAMPTN
WYCOMBE
BARCELONA


These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Racing Database Redesign

Now that the Cheltenham Festival is over it is time to turn at least some of my attention to the redesign of my horse racing database. As I alluded to in my New Year Resolutions post, I'm going to document most of it on this blog in the form of tutorials so that you can follow along and create your own version if you so wish.

Progress is likely to be somewhat haphazard as I am not intending to dedicate myself to this full time. Rather it is a project that will slowly mature into what I want.

I am just in the process of collecting my thoughts and draughting a full specification which will be published in the next week or so. In the meantime, here's some highlights - in no particular order...

1. Automatically record the Betfair historical data freely available from the Betfair site.
2. Design database so that it can be easily expanded with additional information such as jockey and trainer detail when available.
3. Use recorded data to reduce the time taken to assess runners for various in-running strategies such as dutching, back to lay and lay the field. This to be done automatically without user interaction.
4. Make data available from any device and platform.
5. Provide overview of all Betfair graphs for each race.
6. Use as basis for future analysis with a view to developing new strategies and potentially, automated betting bots.

Although the initial database design can be implemented in any modern desktop database application such as MS Access or LibreOffice Base, I'll be using mySQL installed on my Linux server. This machine is on 24/7 and as such can be configured so that the desired functionality can be run automatically without the need to turn my desktop pc on.

The automation is likely to be done in PHP as I'm reasonably familiar with that. Python was/is another option but I've never used it and I don't want to spend too much time learning yet another programming language when I'm not particularly into programming. I've yet to confirm the language to be used and I'm open to suggestions/advice. In any case, the intention is to post as much of the database related code as I can (without giving my secrets away of course).

The draft specification is currently being worked on and I'll post it as soon as it is ready. Stay tuned for that.

Back Or Lay Picks 13th-15th Mar

The midweek pick of Hibernian was something of a tease. Conceding an 88min equaliser the money was heading my way only for Hibs to go straight up the park from the restart and grab a winner. Very frustrating.

There's a large selection for this weekend from which amends can be made - hopefully.

ARSENAL
BOURNEMTH
BRISTOL C.
PRESTON
NORTHAMPTN
HEARTS
RANGERS
FOREST G.
GUISELEY
EBBSFLEET
B.DORTMUND
INT MILAN
WOLFSBURG

These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Back Or Lay Picks 12th Mar

Only the one back or lay pick from the midweek matches.

HIBERNIAN

These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Back Or Lay Picks 7th-8th Mar

Very unusual to have only two picks over a weekend.

TORQUAY
PARIS ST-G

These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.

Back Or Lay Picks 3rd-4th March

The good run was brought to a crashing halt over the weekend unfortunately. Hopefully these midweek games will fare a little better...

MORTON
LIVERPOOL
MAN. CITY
NORWICH
CELTIC

These picks should be backed if decimal odds at KO are 2.00 or over.
They should be laid if their odds are at 1.90 or under.