Search MAFLOnline
Subscribe to MAFL Online

 

Contact Me

I can be contacted via Tony.Corke@gmail.com

 

Latest Information


 

Latest Posts

MAFL Primer > Statistical Analysis, Modelling and Prediction > What Variables Are Used in MAFL Statistical Models?

Search the FAQ for entries containing:

In a fine example of making a virtue of necessity I've steadfastly defended my practice of excluding any player information from the statistical models I build. The thought of scanning the papers every week for news of player ins and outs makes me shudder even now.

What I do use as predictors for modelling game outcomes are: 

  • Variables reflecting the Venue
    • Which team is the home team
    • Where the game is being played
    • How much experience each team has at the venue
  • Measures of Team Strength and Recent Form
    • The TAB Bookmaker's pre-game head-to-head prices
    • The teams' MARS Ratings
    • The teams' recent scoring record

 

VENUE VARIABLES 

Home Team

In most sports, teams have a designated and unique home ground so it's always clear which team is playing at home and which is playing away. This was true of the VFL/AFL competition for most of its history too until, in recent seasons, teams began sharing home grounds, calling multiple venues home, and playing designated "home" games at neutral venues - or even, wackiest of all, playing them at their opponent's home ground.

These recent practices confuse things because, empirically, there is a Home Ground Advantage (HGA) in AFL.

In the early years of MAFL I decided, at the start of each season, which team I'd recognise as the Home team in every contest for that season. That began to feel a bit arbitrary so I now simply recognise as the home team for MAFL purposes whichever team the AFL designates as the home team. There are, in any case, other variables I use that can adjust for some of the anomalies this might introduce when, for example, Richmond play the Gold Coast at "home" at Cazaly's Stadium.

Interstate Status

With the introduction of Greater Western Sydney in 2012 and the Gold Coast in 2011, the AFL competition now includes 8 non-Victorian and 10 Victorian teams, with home grounds spanning 5 States. Interstate travel by the Away team increases the HGA enjoyed by the Home team, though the size of this incremental advantage has been diminishing in recent years.

To explicitly incorporate the beneficial effect for the Home team of interstate travel by the Away team I include in many models a 'dummy' variable that takes on the value: 

  • +1 if the Home team is playing in its home State and the Away team is playing outside its home State
  • 0 if the Home team and the Away team are both playing in or both playing outside their home States
  • -1 if the Home team is playing outside its home State and the Away team is playing in its home State

Venue Experience

One of the reasons posited for the existence of a Home Ground Advantage is the Home team's relative familiarity with the game venue  compared to the Away team. We can treat familiarity as a binary concept simply by recognising which team is the home team and which is the away team or we can, instead or as well, treat familiarity as a continuous variable.

To do this I include two variables, one for each team, which reflect the number of games each has played at the venue of the current game anytime in the past 12 calendar months.

MEASURES OF TEAM STRENGTH

Head-to-Head Prices

Bookmakers are exceptionally good at quantifying the relative strengths of teams, which they convey in the head-to-head prices they offer. MAFL's bookmaker of choice is the TAB Sportsbet market-maker, whose pricing opinions I usually gather at noon on Wednesdays.

Bookmaker prices can be translated into a measure of the teams' relative strengths in a number of ways, but the two formulations I use most commonly are: 

  • The Home Team Implicit Probability: which is defined as the Away Team's Price divided by the sum of the Home Team's and the Away Team's Price.
  • The Home Team Log Probability Ratio:  which is defined as the log of the Home Team Implicit Probability divided by the Away Team Implicit Probability. 

Most of the possible formulations that convert prices into measures of relative strength are highly correlated for all prices except the very small (say less than $1.10) and the very large (say greater than $10). Outside that range, different formulations can behave very non-linearly. For example, consider two different head-to-head markets, the first $1.05 and $10, the second $1.04 and $11. The Implicit Probability measures for these two markets for the favourites are 90.5% and 91.4%, which is about a 1% difference, whereas the Log Probability Ratio measures are 0.98 and 1.02, which is about a 5% difference. The Odds Ratio, another potential measure and defined as (Away Team Price - 1) / (Home Team Price - 1), provides relative strength measures of 180 and 250, which is a 39% difference. Finding a way to reliably incorporate bookmaker opinion in statistical models across the range of team prices is an area of ongoing interest to me.

MARS Ratings

As a supplement to bookmaker prices - and as a replacement for them when I need to project into a future in which I don't have them because they haven't yet been formed - I've created my own team Rating system.

This system is an ELO-style system and so rewards teams for their game-day performance relative to what might be expected of them given the strength of their opponents and whether they face them at home, away, or on a neutral venue. The ELO approach includes a number of user-selectable parameters and, as you'll see in that earlier link, I've developed a variety of Ratings systems based on ELO principles.

The system I use most frequently however and whose Ratings I publish each week during the regular season is the version named MARS, which has proven to be quite good at predicting game margins.

An average MARS Rating is 1,000, while a Rating above 1,025 is exceptional and one below 975 is relatively poor.

Recent Scoring Record

Empirically, Bookmaker prices and MARS Ratings seem to capture teams' short-term and medium-term performance, but neither seem to completely capture the predictive content of teams' longer-term performance. Accordingly, I include in many models variables to reflect the average for and against differential for the home and away teams over the course of the most recent 16 rounds of the current season.

Prior to the end of the 16th round of a given season I set these variables equal to the average for and against differential for each team over the course of all completed rounds in the current season and, for the 1st week of the season, I set these variables equal to zero.

Last updated on January 3, 2013 by TonyC