How did Quintessa’s Sports Rating Algorithm Perform for the 2022 FIFA World Cup?

16 Jan 2023

Quintessa mathematicians and scientists enjoy analysing numbers. Over the last few months, Simon Rookyard and Jodie Stone have put Quintessa’s Sports Rating Algorithm to the test, providing predictions for the 2022 FIFA World Cup. Now that the competition is complete, it’s time to evaluate the algorithm’s performance.

It's the moment of truth! Having used our "N-Estimates" algorithm to predict the results for all the games of the FIFA World Cup 2022, it’s time to evaluate its performance. Previously, each prediction was accompanied by a plot displaying all the possible scoreline probabilities with the most likely outcome embellished by a cross. These plots have been updated with a green scoreline representing the actual result. The final plots for all the 2022 FIFA World Cup matches can be found at the end of this news story.

When assessing the performance of the algorithm during the competition, we have considered three performance metrics:

the percentage of correct outcome (win/draw/loss) predictions;
the percentage of correct goal difference predictions;
and the percentage of correct exact scoreline predictions.

As the data used to train the algorithm consist primarily of matches that finish after 90 minutes (regardless of the score), it is most appropriate to compare our predictions against the scores at the end of normal time (this choice, as opposed to using scores after extra time, only made a difference in one match during the tournament). Figure 1 compares the algorithm’s performance over the whole of the 2022 World Cup competition against a benchmark in each of our three metrics. The benchmarks used for the correct outcome, goal difference and scoreline have been generated by a Monte Carlo simulation and represent the expected success rates if scores had been predicted by chance.

We can see that the algorithm performed well in predicting the correct outcomes, goal differences and scorelines. The correct outcome (not including penalties) was predicted in 56% of matches, approximately 22% more than we would expect by chance. The correct goal difference and correct scoreline were predicted in approximately 9% and 8% more matches (respectively) than expected by chance. A closer investigation reveals that the success rate of N-Estimates predicted match outcomes lies 3.7 standard deviations above the expectation of chance, 1.9 standard deviations above the expectation for predicted goal difference and 2.5 standard deviations above the expectation for predicted scorelines.

It is also interesting to compare the algorithm’s performance with that of an expert. BBC pundit Chris Sutton predicted the result of each match during the tournament, and his success rates against our three metrics are also shown in Figure 1. It was very close, with both Chris and N-Estimates making the same number of correct outcome and scoreline predictions. Chris just pipped the algorithm by one correct goal difference prediction. Congratulations Chris!

Links to Chris Sutton’s predictions: Group Stages 1, Group Stages 2, Group Stages 3, Last 16, Quarter Finals, Semi Finals, Final & Third Place Playoff

There is scope for further improvement of N-Estimates. There appears to be a source of variability in the results that has not been captured by the algorithm; as the algorithm was tuned predominantly to qualifying matches for the continental championships and World Cups, this implies a source of variability that is only significant at major competitions. One possible cause is the lack of inter-continent games – the vast majority of matches in the training data set are between teams from the same continent. This could lead to a shortage of data with which the algorithm can judge the relative average strengths of different continents. One response to this could be to introduce a two-tier rating system, in which teams are rated within their continent and the few intercontinental matches in the dataset are used to rate continents against each other. Given that the number of continents is much lower than the number of teams, the algorithm should, in principle, be able to make a better assessment of continent ratings with such a sparse dataset.

November 20 2022, Qatar vs. Ecuador. Central normal time prediction: 0 - 2. Confidence range for goal difference (Qatar minus Ecuador): -3 to 0. Actual normal time result: 0 - 2. — Figure 2: Predictions and actual scores for every match in the tournament. For each plot, the circles represent possible final scores after 90 minutes, with the number of goals scored by each team plotted on the axes. Each circle has been colour coded to indicate the probability of that result occurring, with the most likely outcome marked with a black cross. The solid and dashed orange lines represent the mean and 1σ uncertainty range respectively for the predicted goal difference, and the dashed blue lines indicate a goal difference of zero. The actual result after 90 minutes has been shaded green.

November 21 2022, Senegal vs. Netherlands. Central normal time prediction: 0 - 4. Confidence range for goal difference (Senegal minus Netherlands): -6 to -3. Actual normal time result: 0 - 2. — Figure 2: Predictions and actual scores for every match in the tournament. For each plot, the circles represent possible final scores after 90 minutes, with the number of goals scored by each team plotted on the axes. Each circle has been colour coded to indicate the probability of that result occurring, with the most likely outcome marked with a black cross. The solid and dashed orange lines represent the mean and 1σ uncertainty range respectively for the predicted goal difference, and the dashed blue lines indicate a goal difference of zero. The actual result after 90 minutes has been shaded green.

Quintessa is not affiliated in any way with FIFA or the BBC. Its application of the N-estimates algorithm to the FIFA World Cup 2022 competition is an independent and non-commercial endeavour.

News