xG glossary: every football analytics term explained
Football analytics has its own dialect. Once you step past xG, the acronyms pile up fast: xA, xT, PSxG, SHAP, ROC-AUC. This glossary collects the terms you will see in the simulator, in this site’s articles, and in most academic papers on expected goals. Each entry is kept deliberately short, with pointers to longer explanations when they help.
Core metrics
- xG — Expected Goals
- Probability a shot becomes a goal, from 0 to 1. See the full guide.
- xA — Expected Assists
- Expected number of assists a player generates, usually computed as the sum of the xG of every shot assisted by a given player’s passes.
- xT — Expected Threat
- Value assigned to every action (pass, carry, dribble) based on how much it increases the probability of scoring in the next few seconds. Built from a grid of the pitch, not from shot events directly.
- PSxG — Post-Shot xG
- Expected goals conditional on the shot being on target. It only uses shots that required a save, which makes it useful for evaluating goalkeepers (actual goals conceded vs PSxG faced).
- Shot map
- A visualisation that plots every shot a team or player took, typically sized by xG. Instantly reveals shot selection patterns: who takes high-quality chances, who speculates from distance.
Shot-level features
- Distance
- Straight-line distance from the ball to the centre of the goal line at the moment of the shot.
- Angle
- The angle at the shooter, spanning the two posts. Small angle = tight chance; wide angle = central chance.
- Kos Angle
- Introduced by Karim & Marwane (2023). The shooter’s angle minus the angles blocked by defenders inside the ball-to-posts triangle. Captures how much of the goal is actually exposed.
- First time / first touch
- Indicates the shot was taken without controlling the ball first. Tends to surprise the goalkeeper and correlates with slightly higher xG, all else equal.
- Under pressure
- Whether a defender was within close range of the shooter at the moment of contact.
- Body part
- Right foot, left foot, head, or other (shoulder, knee, chest). Headers historically convert at lower rates than feet from the same location.
- Technique
- Normal, volley, half-volley, overhead kick, diving header, backheel, lob. Each technique has a different empirical conversion rate.
Machine learning vocabulary
- Logistic regression
- The simplest classifier used for xG: a weighted sum of features squeezed through a sigmoid to produce a probability.
- Decision tree / Random Forest
- A tree asks a series of yes/no questions about the features to reach a prediction. A random forest averages many such trees trained on random subsets of the data, which reduces overfitting.
- Gradient Boosting / AdaBoost
- Ensemble methods that train trees sequentially, each one correcting the errors of its predecessors. Gradient-boosted trees are the workhorse of modern xG models.
- CNN — Convolutional Neural Network
- A deep-learning architecture designed for images. In xG, it lets the model read a tensor representation of the pitch (ball in one channel, defenders in another) and learn spatial patterns directly, without handcrafted features.
- SHAP value
- A number expressing how much each feature contributed to a single prediction. In xG research, SHAP plots are used to rank features by importance; the Kos Angle, for example, ranks at or near the top in tree-based xG models.
Evaluation metrics
- Precision / Recall
- Classification metrics. Precision measures what fraction of predicted positives were correct; Recall measures what fraction of actual positives the model caught.
- ROC-AUC
- Area under the Receiver Operating Characteristic curve. A single number between 0.5 (random) and 1.0 (perfect) summarising how well the model separates goals from non-goals across all thresholds.
- Brier score
- The mean squared error between predicted probability and the actual outcome (0 or 1). Lower is better. Rewards well-calibrated probabilities, not just correct rankings.
- MAE & RMSE
- Mean Absolute Error and Root Mean Squared Error between the model’s xG and a reference xG such as StatsBomb’s. RMSE penalises big individual errors more harshly.
- Pearson correlation
- A value between -1 and 1 indicating how tightly two xG series move together. Used to compare a local model against a reference model over the same shots.
Data providers you will see cited
- StatsBomb
- A widely used football data provider. Publishes a free sample of detailed event data, which is the training set behind many of the academic xG models implemented here.
- mplsoccer
- An open-source Python library that wraps StatsBomb’s API and simplifies pitch plotting — commonly cited in xG research code.