MAFL Online - Probability, Stats and AFL Footy

Tuesday

Mar232010

Why April's Conceivably Better Than March

Tuesday, March 23, 2010 at 8:32PM

It's an unlikely scenario I know, but if the players on the AFL Seniors lists ever got to talking about shared birthdays I'd wager they'd find themselves perplexed. As chestnuts go, the Birthday problem is about as hoary as they come. It's about the probability that two randomly selected people share a birthday and its longevity is due to the amazement most people express on discovering that you need just 23 randomly selected people to make it more likely than not that two or more of them will share a birthday. I'll venture that few if any of the 634 players on the current Seniors lists know that but, even if any of them did, they'd probably still be startled by what I'll call the AFL Birthday phenomenon.

Click to read more ...

TonyC |

Post a Comment |

Saturday

Mar202010

The Other AFL Draft

Saturday, March 20, 2010 at 9:35PM

Drafting is a tactic well-known to cyclists and motor-racers and confers an advantage on the drafter by reducing the effort that he or she (or his or her vehicle) needs to expend in order to move. There's a similar concept in round-robin sports where it's called the carry-over effect, which relates to the effect on a team's performance in a particular game that's due to the team its current opponent played in the previous round. Often, for example, it's considered advantageous to play teams the week after they've faced a difficult opponent, the rationale being that they'll have been 'softened up', demoralised and generally made to feel blah by the previous match.

Click to read more ...

TonyC |

Post a Comment |

Tuesday

Mar092010

Using a Ladder to See the Future

Tuesday, March 9, 2010 at 9:03PM

The main role of the competition ladder is to provide a summary of the past. In this blog we'll be assessing what they can tell us about the future. Specifically, we'll be looking at what can be inferred about the make up of the finals by reviewing the competition ladder at different points of the season.

I'll be restricting my analysis to the seasons 1997-2009 (which sounds a bit like a special category for Einstein Factor, I know) as these seasons all had a final 8, twenty-two rounds and were contested by the same 16 teams - not that this last feature is particularly important.

Let's start by asking the question: for each season and on average how many of the teams in the top 8 at a given point in the season go on to play in the finals?

The first row of the table shows how many of the teams that were in the top 8 after the 1st round - that is, of the teams that won their first match of the season - went on to play in September. A chance result would be 4, and in 7 of the 13 seasons the actual number was higher than this. On average, just under 4.5 of the teams that were in the top 8 after 1 round went on to play in the finals.

This average number of teams from the current Top 8 making the final Top 8 grows steadily as we move through the rounds of the first half of the season, crossing 5 after Round 2, and 6 after Round 7. In other words, historically, three-quarters of the finalists have been determined after less than one-third of the season. The 7th team to play in the finals is generally not determined until Round 15, and even after 20 rounds there have still been changes in the finalists in 5 of the 13 seasons.

Last year is notable for the fact that the composition of the final 8 was revealed - not that we knew - at the end of Round 12 and this roster of teams changed only briefly, for Rounds 18 and 19, before solidifying for the rest of the season.

Next we ask a different question: if your team's in ladder position X after Y rounds where, on average, can you expect it to finish.

Regression to the mean is on abundant display in this table with teams in higher ladder positions tending to fall and those in lower positions tending to rise. That aside, one of the interesting features about this table for me is the extent to which teams in 1st at any given point do so much better than teams in 2nd at the same point. After Round 4, for example, the difference is 2.6 ladder positions.

Another phenomenon that caught my eye was the tendency for teams in 8th position to climb the ladder while those in 9th tend to fall, contrary to the overall tendency for regression to the mean already noted.

One final feature that I'll point out is what I'll call the Discouragement Effect (but might, more cynically and possibly accurately, have called it the Priority Pick Effect), which seems to afflict teams that are in last place after Round 5. On average, these teams climb only 2 places during the remainder of the season.

Averages, of course, can be misleading, so rather than looking at the average finishing ladder position, let's look at the proportion of times that a team in ladder position X after Y rounds goes on to make the final 8.

One immediately striking result from this table is the fact that the team that led the competition after 1 round - which will be the team that won with the largest ratio of points for to points against - went on to make the finals in 12 of the 13 seasons.

You can use this table to determine when a team is a lock or is no chance to make the final 8. For example, no team has made the final 8 from last place at the end of Round 5. Also, two teams as lowly ranked as 12th after 13 rounds have gone on to play in the finals, and one team that was ranked 12th after 17 rounds still made the September cut.

If your team is in 1st or 2nd place after 10 rounds you have history on your side for them making the top 8 and if they're higher than 4th after 16 rounds you can sport a similarly warm inner glow.

Lastly, if your aspirations for your team are for a top 4 finish here's the same table but with the percentages in terms of making the Top 4 not the Top 8.

Perhaps the most interesting fact to extract from this table is how unstable the Top 4 is. For example, even as late as the end of Round 21 only 62% of the teams in 4th spot have finished in the Top 4. In 2 of the 13 seasons a Top 4 spot has been grabbed by a team in 6th or 7th at the end of the penultimate round.

TonyC |

2 Comments |

Monday

Mar012010

Testing the HELP Model

Monday, March 1, 2010 at 8:48PM

It had been rankling me that I'd not come up with a way to validate any of the LAMP, HAMP or HELP models that I chronicled the development of in earlier blogs.

In retrospect, what I probably should have done is build the models using only the data for seasons 2006 to 2008 and then test the resulting models on 2009 data but you can't unscramble an egg and, statistically speaking, my hen's albumen and vitellus are well and truly curdled.

Then I realised that there is though another way to test the models - well, for now at least, to test the HELP model.

Any testing needs to address the major criticism that could be levelled at the HELP model, which is a criticism stemming from its provenance. The final HELP model is the one that, amongst the many thousands of HELP-like models that my computer script briefly considered (and, as we'll see later, of the thousands of billions that it might have considered), was able to be made to best fit the line betting data for 2008 and 2009, projecting one round into the future using any or all of 47 floating window models available to it.

From an evolutionary viewpoint the HELP model represents an organism astonishingly well-adapted to the environment in which its genetic blueprint was forged, but whether HELP will be the dinosaur or the horseshoe crab equivalent in the football modelling universe is very much an open question.

Test Details

With the possible criticism I've just described in mind, what I've done to test the HELP model is to estimate the fit that could be achieved with the modelling technique used to find HELP had the line betting result history been different but similar. Specifically,what I've done is taken the actual timeseries of line betting results, randomised them, run the same script that I used to create HELP and then calculated how well the best model the script can find fits the alternative timeseries of line betting outcomes.

Let me explain what I mean by randomising the timeseries of line betting results by using a shortened example. If, say, the real, original timeseries of line betting results were (Home Team Wins, Home Team Loses, Home Team Loses, Home Team Wins, Home Team Wins) then, for one of my simulations, I might have used the sequence (Home Team Wins, Home Team Wins, Home Team Wins, Home Team Loses, Home Team Loses), which is the same set of results but in a different order.

From a statistical point of view, it's important that the randomised sequences used have the same proportions of "Home Team Wins" and "Home Team Loses" as the original, real series, because part of the predictive power of the HELP model might come from its exploiting the imbalance between these two proportions. To give a simple example, if I fitted a model to the roll of a die and its job was to predict "Roll is a 6" or "Roll is not a 6", a model that predicted "Roll is not a 6" every time would achieve an 83% hit rate solely from picking up on the imbalance in the data to be modelled. The next thing you know, someone introduces a Dungeons & Dragons 20-sided dice and your previously impressive Die Prediction Engine self-immolates before your eyes.

Having created the new line betting results history in the manner described above, the simulation script proceeds by creating the set of floating window models - that is, the models that look only at the most recent X rounds of line betting and home team bookie price data for X ranging from 6 to 52 - then selects a subset of these models and determines the week-to-week weighting of these models that best fits the most recent 26 rounds of data. This optimal linear combination of floating window models is then used to predict the results for the following round. You might recall that this is exactly the method used to create the HELP model in the first place.

The model that is built using the currently best-fitting linear combination of floating window models is, after a fixed number of models have been considered, declared the winner and the percentage of games that it correctly predicts is recorded. The search then recommences with a different set of line betting outcomes, again generated using the method described earlier and another, new winning model is found for this line betting history and its performance noted.

In essence, the models constructed in this way tell us to what extent the technique I used to create HELP can be made to fit a number of arbitrary sequences of line betting results, each of which has the same proportion of Home Team wins and losses as the original, real sequence. The better the average fit that I can achieve to such an arbitrary sequence, the less confidence I can have that the HELP model has actually modelled something inherent in the real line betting results for seasons 2008 and 2009 and the more I should be concerned that all I've got is a chameleonic modelling technique capable of creating a model flexible enough to match any arbitrary set of results - which would be a bad thing.

Preliminary Test Results

You'll recall that there are 47 floating window models that can be included in the final model, one floating window model that uses only the past 6 weeks, another that uses the past 7 weeks, and so on up to one that uses the past 52 weeks. If you do the combinatorial maths you'll discover that there are almost 141,000 billion different models that can be constructed using one or more of the floating window models.

The script I've written evaluates one candidate model about every 1.5 seconds so it would take about 6.7 billion years for it to evaluate them all. That would allow us to get a few runs in before the touted heat death of the universe, but it is at the upper end of most people's tolerance for waiting. Now, undoubtedly, my code could do with some optimisation, but unless I can find a way to make it run about 13 billion times faster it'll be quicker to revert to the original idea of letting the current season serve as the test of HELP rather than use the test protocol I've proposed here. Consequently I'm forced to accept that any conclusions I come to about whether of not the performance of the HELP model's is all down to chance are of necessity only indicative.

That said there is one statistical modelling 'trick' I can use to improve the potential for my script to find "good" models and that is to bias the combinations of floating window models that my script considers towards those combinations that are similar to those that have already demonstrated above-average performance during the simulation so far. So, for example, if a model using the 7-round, 9-round and 12-round floating window models look promising, the script might next look at a model using the 7-round, 9-round and 40-round floating window models (ie change just one of the underlying models) or it might look at a model using the 7-round, 9-round, 12-round and 45-round floating window models (ie add another floating model). This is a very rudimentary version of what the trade calls a Genetic Algorithm and it is exactly the same approach I used in finding the final HELP model.

You might recall that HELP achieved an accuracy of 60.8% across seasons 2008 and 2009, so the statistic that I've had the script track is how often the winning model created for a given set of line betting outcomes performs as well as or better than the HELP model.

From the testing I've done so far my best estimate of the likelihood that the HELP model's performance can be adequately explained by chance alone is between 5 and 35%. That's maddeningly wide and spans the interval from "statistically significant" to "wholly unconvincing" so I'll continue to run tests over the next few weeks as I'm able to tie up my computer doing so.

TonyC |

Post a Comment |

Thursday

Feb182010

Another Day, Another Model

Thursday, February 18, 2010 at 10:13PM

In the previous blog I developed models for predicting victory margins and found that the selection of a 'best' model depended on the criterion used to measure performance.

This blog I'll review the models that we developed and then describe how I created another model, this one designed to predict line betting winners.

The Low Average Margin Predictor

The model that produced the lowest mean absolute prediction error MAPE was constructed by combining the predictions of two other models. One of the constituent models - which I collectively called floating window models - looked only at the victory margins and bookie's home team prices for the last 22 rounds, and the other constituent model looked at the same data but only for the most recent 35 rounds.

On their own neither of these two models produce especially small MAPEs, but optimally combined they produce an overall model with a 28.999 MAPE across seasons 2008 and 2009 (I know that the three decimal places is far more precision than is warranted, but any rounding's going to nudge it up to 29 which just doesn't have the same ability to impress. I consider it my nod to the retailing industry, which persists in believing that price proximity is not perceived linearly and so, for example, that a computer priced at $999 will be thought meaningfully cheaper than one priced at $1,000).

Those optimal weightings were created in the overall model by calculating the linear combination of the underlying models that would have performed best over the most recent 26 weeks of the competition, and then using those weights for the current week's predictions. These weights will change from week to week as one model or the other tends to perform better at predicting victory margins; that is what gives this model its predictive chops.

This low MAPE model henceforth I shall call the Low Average Margin Predictor (or LAMP, for brevity).

The Half Amazing Margin Predictor

Another model we considered produced margin predictions with a very low median absolute prediction error. It was similar to the LAMP but used four rather than two underlying models: the 19-, 36-, 39- and 52-round floating window models.

It boasted a 22.54 point median absolute prediction error over seasons 2008 and 2009, and its predictions have been within 4 goals of the actual victory margin in a tick over 52% of games. What destroys its mean absolute prediction error is its tendency to produce victory margin predictions that are about as close to the actual result as calcium carbonate is to coagulated milk curd. About once every two-and-a-half rounds one of its predictions will prove to be 12 goals or more distant from the actual game result.

Still, its median absolute prediction error is truly remarkable, which in essence means that its predictions are amazing about half the time, so I shall name it the Half Amazing Margin Predictor (or HAMP, for brevity).

In their own highly specialised ways, LAMP and HAMP are impressive but, like left-handed chess players, their particular specialities don't appear to provide them with any exploitable advantage. To be fair, TAB Sportsbet does field markets on victory margins and it might eventually prove that LAMP or HAMP can be used to make money on these markets, but I don't have the historical data to test this now. I do, however, have line market data that enables me to assess LAMP's and HAMP's ability to make money on this market, and they exhibit no such ability. Being good at predicting margins is different from being good at predicting handicap-adjusted margins.

Nonetheless, I'll be publishing LAMP's and HAMP's margin predictions this season.

HELP, I Need Another Model

Well if we want a model that predicts line market winners we really should build a dedicated model for this, and that's what I'll describe next.

The type of model that we'll build is called a binary logit. These can be used to fit a model to any phenomenon that is binary - that is, two-valued - in nature. You could, for example, fit one to model the characteristics of people who do or don't respond to a marketing campaign. In that case, the binary variable is campaign response. You could also, as I'll do here, fit a binary logit to model the relationship between home team price and whether or not the home team wins on line betting.

Fitting and interpreting such models is a bit more complicated than fitting and interpreting models fitted using the ordinary least squares method, which we covered in the previous blog. For this reason I'll not go into the details of the modelling here. Conceptually though all we're doing is fitting an equation that relates the Home team's head-to-head price with its probability of winning on line betting.

For this modelling exercise I have again created 47 floating window models of the sort I've just described, one model that uses price and line betting result data only the last 6 rounds, another that use the same data for the last 7 rounds, and so on up to one that uses data from the last 52 rounds.

Then, as I did in creating HAMP and LAMP, I looked for the combination of floating window models that best predicts winning line bet teams.

The overall model I found to perform best combines 24 of the 47 floating window models - I'll spare you the Lotto-like list of those models' numbers here. In 2008 this model predicted the line betting winner 57% of the time and in 2009 it predicted 64% of such winners. Combined, that gives it a 61% average across the two seasons. I'll call this model the Highly Evolved Line Predictor (or HELP), the 'highly evolved' part of the name in recognition of the fact that it was selected because of its fitness in predicting line betting winners in the environment that prevailed across the 2008 and 2009 seasons.

Whether HELP will thrive in the new environment of the 2010 season will be interesting to watch, as indeed will be the performance of LAMP and HAMP.

In my previous post I drew the distinction between fitting a model and using it to predict the future and explained that a model can be a good fit to existing data but turn out to be a poor predictor. In that context I mentioned the common statistical practice of fitting a model to one set of data and then measuring its predictive ability on a different set.

HAMP, LAMP and HELP are somewhat odd models in this respect. Certainly, when I've used them to predict they're predicting for games that weren't used in the creation of any of their underlying floating window models. So that's a tick.

They are, however, fitted models in that I generated a large number of potential LAMPs, HAMPs and HELPs, each using a different set of the available floating window models, and then selected those models which best predicted the results of the 2008 and 2009 seasons. Accordingly, it could well be that the superior performance of each of these models can be put down to chance, in which case we'll find that their performances in 2010 will drop to far less impressive levels.

We won't know whether or not we're witnessing such a decline until some way into the coming season but in the meantime we can ponder the basis on which we might justify asserting that the models are not mere chimera.

Recall that each of the floating window models use as predictive variables nothing more than the price of the Home team. The convoluted process of combining different floating window models with time-varying weights for each means that, in essence, the predictions of HAMP, LAMP and HELP are all just sophisticated transformations of one number: the Home team price for the relevant game.

So, for HAMP, LAMP and HELP to be considered anything other than statistical flukes it needs to be the case that:

the TAB Sportsbet bookie's Home team prices are reliable indicators of Home teams' victory margins and line betting success
the association between Home team prices and victory margins, and between Home team prices and line betting outcomes varies in a consistent manner over time
HAMP, LAMP and HELP are constructed in such a way as to effectively model these time-varying relationships

On balance I'd have to say that these conditions are unlikely to be met. Absent the experience gained from running these models live during a fresh season then, there's no way I'd be risking money on any of these models.

Many of the algorithms that support MAFL Funds have been developed in much the same way as I've described in this and the previous blog, though each of them is based on more than a single predictive variable and most of them have been shown to be profitable in testing using previous seasons' data and in real-world wagering.

Regardless, a few seasons of profitability doesn't rule out the possibility that any or all of the MAFL Fund algorithms haven't just been extremely lucky.

That's why I'm not retired ...

TonyC |

Post a Comment |