MAFL Online - Probability, Stats and AFL Footy

MAFL Primer > Statistical Analysis, Modelling and Prediction > What Types of Statistical Models Does MAFL Use for Prediction?

Though MAFL has explored a huge variety of statistical modelling techniques generally, perhaps best exemplified by this post from 2011 where I pitted a number of them head-to-head in attempting to predict game margins, only three are currently in regular use in MAFL:

Conditional Inference Tree Forests: this technique is used to create probability predictions for the current Head-to-Head and Line Funds, and for the WinPred and ProPred Predictors. Empirically, Conditional Inference Tree Forests appear capable of capturing the non-linear relationships inherent in football data without toppling into the abyss of overfitting.
You can think of a single conditional inference tree as a decision tree built using a subset of the available data and employing specific algorithmic rules designed to produce ensembles that are both accurate and robust. For more details, try googling "conditional inference tree".
Neural Networks: up until 2011, to the best of my recollection I'd never used a neural network for MAFL. That changed with the introduction of two such networks in that year, each designed to produce Margin predictions. Both were created using Phil Brierly's Tiberius tool.
Empirical Loss Minimisers: aside from the two neural networks, the models used by all Margin predictors were created using Eureqa Formulize, which is described on its website as "a scientific data mining software package that searches for mathematical patterns hidden in your data". The inputs to Eureqa were the actual Margins from a number of games and the predictions from other statistical and empirical models - for example the probability predictions of the WinPred and ProPred.
To fit a model using Eureqa an analyst needs to select a loss-function or error metric which is used by Eureqa to assess how well a model it is considering fits the training data. To be honest, I don't recall exactly which error metric I used to create the Margin predictors with Eureqa.

Another modelling technique that's regularly trotted out in the pages of MAFL is the binary logit, which is used to model binary outcomes such as winning versus losing. I've tended to more often use this to fit historical data rather than to make predictions, but that distinction's probably moot.

Last updated on January 2, 2013 by TonyC