How xG models differ (and why their numbers rarely match)

By Andre Schlaepfer · Updated April 2026 · ~15 min read

If you have ever compared the xG for the same match across two different providers and found numbers that disagree by as much as half a goal, you are not imagining things. Every xG model makes its own choices about which features of a shot to observe and how to combine them, and those choices produce very different estimates. This article walks through the four academic models that power this calculator — Rathke’s zone model, the Kos Angle model from Karim & Marwane, the CNN-based model from Matteotti & Sotudeh, and the combined model based on Anzer & Bauer, Hewitt & Karakuş and Narayanan & Pifer — and explains where the numbers come from and why they disagree. The review below is a condensed version of the literature survey in my 2024 undergraduate project at UFRJ.

1. Rathke (2017) — zone-based xG

The oldest and simplest model in the calculator. Rathke divides the attacking half into eight zones and reports the historical percentage of shots that became goals in each. Zone 1, a tight band in front of the six-yard box, scores at roughly 39.8%; Zone 2, the heart of the penalty area just beyond it, scores 18.8%; the zones outside the box drop into single-digit percentages. Zone 6 — the corridor out near the byline, outside the penalty area — is less than 3%.

The model needs only the shot’s x/y coordinate to produce a value. That is its strength and its weakness. It gives a quick, intuitive number anyone can sanity-check on a pitch diagram, but it cannot distinguish a free header at the penalty spot from a shot wedged behind three defenders at the same coordinates. Use it as a baseline.

2. Karim & Marwane (2023) — the Kos Angle

Karim and Marwane introduced an angular feature they call the Kos Angle: start with the angle of the triangle formed by the ball and the two goalposts, then subtract the angles occupied by defenders standing inside that triangle. What is left is the “open” angle actually available for a shot to reach the net. Intuitively, a 40° open angle with nothing in front is very different from a 40° opening with a defender bisecting it.

The authors found that adding the Kos Angle as a feature to standard models improved performance markedly, and in the tree-based models it was consistently the highest-SHAP feature — that is, the variable with the biggest influence on the final xG. In the simulator, this is the model that reacts most dramatically when you add a defender between the ball and the goal: an apparently good location can lose most of its xG once the lane is closed.

3. Matteotti & Sotudeh — CNN xG from pixels

Matteotti and Sotudeh take a different route. Instead of handcrafting features, they turn each shot into a two-channel image on a 30×40 grid. Channel one has a single pixel for the ball. Channel two has a pixel for each defender. The image is then blurred with a Gaussian filter (σ = 1.25) so the model sees a heat-map of defenders rather than dots, and fed into a convolutional neural network that outputs the probability of a goal.

This model is conservative: it rarely produces very high xG values, even on chances human scouts would call clear-cut, because the training target (whether the shot actually became a goal) is extremely skewed toward zero. In the simulator the CNN model is a useful counterweight: when the simpler zone or angle models say a chance is huge, the CNN’s more measured value reminds you that even point-blank shots get saved often.

4. Aggregated xG (Anzer & Bauer, Hewitt & Karakuş, Narayanan & Pifer)

The fourth model is a combined feature set drawn from three recent papers. It includes x/y coordinates, distance and angle, goalkeeper position, the number of players inside the shot triangle, player density, the two closest defenders’ distances from the shooter, a flag for first-time shots, a flag for shots under pressure, the body part used and the technique. Under the hood, the deployed model is a gradient boosted tree, selected from among logistic regression, AdaBoost and gradient boosting after comparing Precision, Recall, Brier score and ROC-AUC on StatsBomb data.

This is the richest and typically the most accurate model in the calculator. If you care about the single value that best reflects what modern xG providers produce, this one is a sensible primary reference.

Why the numbers disagree

Looking at all four together, it becomes obvious why public xG figures from different providers rarely match.

Different input features. Rathke sees only position. The CNN sees position plus defender heat-maps. The aggregated model sees position, defenders, goalkeeper, technique and body part. More features means more sensitivity, for better or worse.
Different training data. Every model is trained on a different mix of leagues and seasons. The literature reviewed here spans StatsBomb data from the Bundesliga, Premier League, Serie A, Ligue 1, the UEFA Champions League, Euro 2020 and the 2022 FIFA World Cup — plus, in the Matteotti & Sotudeh paper, non-European leagues such as the Indian Super League.
Different algorithmic assumptions. Logistic regression, random forest, gradient boost, AdaBoost and CNNs all learn non-linearities differently. The same feature vector can produce noticeably different outputs.

Treat the disagreement as a feature

When the four models agree, you can be fairly confident a chance is as good or as bad as it looks. When they disagree, the size of the disagreement tells you something too: you are probably looking at a shot whose quality depends heavily on context the simpler models cannot see — defender positioning, goalkeeper placement, a narrow open angle. That is when it becomes useful to build intuition by tweaking the scenario in the simulator and watching the four values move together or apart.