In the last article, we compared the predictions of the Elo model with the actual average results for each percentile and we found out that it is quite well predicting the outcome of future matches but there is still room for improvement. This article introduces a new prediction model that is based on the Elo system as well as on the Poisson distribution. Finally, the new prediction method is quantitatively examined.

Several sources state that the number of goals scored by a team can be described by the Poisson distribution. My goal is to combine the Poisson model with the Elo system to optimise the prediction of football results.

The Poisson distribution has one parameter that is equal to the expected value of the distribution. I assume that the expected number of goals scored by home or away side is depending on the Elo difference between the two teams. The following graph shows the average number of goals scored by the home and the away team. The x-axis represents the probability implied by the Elo difference.

You can see that the average number of goals scored by the home team constantly increases when the probability to win increases, analogically the away team is expected to score less goals in the same game. This sounds trivial but the exact shape of the curves is interesting and important. The blue curves are their approximations that will be used for the prediction model. For every Elo difference we now have an expected value for the number of home and away goals which are assumed to be independent from each other. The Poisson distribution can now provide us with the probability for all the possible results. The following example is for Anzhi vs Kuban on July 22, 2012:

Home/Away goals | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|

0 | 7.44% | 7.30% | 3.57% | 1.17% | 0.29% | 0.06% |

1 | 12.04% | 11.80% | 5.79% | 1.89% | 0.46% | 0.09% |

2 | 9.73% | 9.55% | 4.68% | 1.53% | 0.37% | 0.07% |

3 | 5.25% | 5.15% | 2.52% | 0.82% | 0.20% | 0.04% |

4 | 2.12% | 2.08% | 1.02% | 0.33% | 0.08% | 0.02% |

5 | 0.69% | 0.67% | 0.33% | 0.11% | 0.02% | 0.01% |

We can now sum up the probabilities for an Anzhi victory, a draw and a Kuban victory. I use rounded values to avoid the impression of high precision:

Anzhi | Draw | Kuban |
---|---|---|

52% | 25% | 23% |

For 2-leg matches, it is even possible to calculate the probability of one team to qualify based on the result of the first leg or for both legs combined. All these probabilities are slightly different from what the Elo model suggests. Let us see if these new proabilities do a better job. As in the previous article we compare for each percentile the average result (draw counting as half-win half-loss), the ideal curve is shown in grey, the old prediction model is in white and the new prediction model is in yellow. All the games are grouped into percentiles according to what eacvh model predicts. The y-axis denotes the actual average results.

Using the Poisson distribution definitely improves the predictions. From now on, the new prediction model will be used for future fixtures. Two-leg games will as well be predicted, the legs on their own and the probabilities for both clubs to qualify. However the new prediction model will not be used to determine how many Elo points a team wins for a victory, this will be done by the Elo model, so the rankings and Elo values will not change.

*Many of my readers have been asking for the equation for the Curved approximation. Here it is:Goals for the Home team:if Proba < 0.5: Home Goals = 0.2 + 1.1*sqrt(Proba/0.5)else: Home Goals = 1.69 / (1.12*sqrt(2 -Proba/0.5)+0.18)Goals for the Away team:if Proba < 0.8: Away goals = -0.96 + 1/(0.1+0.44*sqrt((Proba+0.1)/0.9))else: Away goals = 0.72*sqrt((1 - Proba)/0.3)+0.3"Proba" is the Probability (Winning Expectancy) from the Elo Formula, ranging from 0 to 1, "math.sqrt" is the square root.*