Research Note

Beyond Expected Goals: A Possession-Aware View of Chance Creation in Soccer

Authors:

Jonathan Pipping, Ph.D. Student, Wharton Sports Analytics and Business Initiative Research Team
Tianshu Feng, MSE Data Science, University of Pennsylvania ’26
Paul Sabin, Senior Sports Analytics Fellow, Wharton Sports Analytics and Business Initiative

Published: November 13, 2025

Limitations of Expected Goals (xG)

Expected goals (xG) tells us how likely a recorded shot is to become a goal. It’s a useful statistic, but it skips the step that often matters most: creating the shot. Crosses that are a toe away, cutbacks that are nicked at the last instant, or a striker who creates space and then takes a half-second too long to shoot – standard xG gives those near-shots a value of zero because no shot occurred. It can also over-count during rebound scrambles where only one goal could ever be scored. These blind spots are significant and ripple into team ratings
and player evaluation that lean on xG as a measuring stick.

Motivating Example

The sequence below highlights this difference clearly. Here, Kylian Mbappé first has a guarded look, then shakes his marker and turns it into a clean shot. Traditional xG gives a big number (0.5 per FBRef) to the final shot, but completely neglects the most skillful part: creating it.

Our framework credits both the chance of a shot happening and the chance it goes in once it does, so the value rises as the opportunity is created, not just when the trigger is finally pulled.

A soccer player in white is preparing to shoot at goal, facing a goalkeeper in orange, with several defenders in dark jerseys nearby.
(a) Before the move: a potential shot with pressure arriving.
A soccer player in white prepares to score with a goalkeeper and three defenders in maroon attempting to block the shot.
(b) After the move: space created, shot quality spikes.

Figure 1: This move showcases the value of shot creation, not just conversion. Our metric credits both steps, not just the final shot.

Introducing xG+

We split the problem into two bite-sized probabilities at each instant t. Let xSt represent the probability that a shot occurs in the next second and xGt represent the probability that a shot is scored given that it’s taken at time 𝑡. Symbolically,

The image shows a mathematical representation of expected goals (xG) in soccer, detailing the formula for calculating xG in a specific time step and over a possession.

which respects that a maximum of one goal can result from each possession. This effectively adds up danger over a possession without double-counting rebounds or omitting near-chances that never manifest as observed shots.

To estimate xS and xG, we train two gradient-boosted trees on tracking + event data (from Gradient Sports) for three EPL seasons (2022–25). Features cover the ball (distance, angle, height, speed), goalkeeper position, and compact summaries of nearby attackers and defenders (distances/bearings to the closest five). Distance dominates both tasks; an “open goal” proxy helps finishing probability while higher ball speed generally triggers shots (good for xS) but makes them a bit harder to convert (tougher for xG), which aligns with intuition about rushed chances.

Bar chart showing xS model feature importance. Top features: 'r,' 'speed,' and 'OffDist1.' 'r' has the highest importance.
Figure 2: Drivers of shot creation (xS).
Bar chart titled "xG Model Feature Importance (Gain)" showing various soccer-related features ranked by xG importance. "r" is the most important.
Figure 3: Drivers of finishing (xG).

Does this help predict games?

To answer that on the team level, we do a cross-validation study where each fold is a matchday. Within each fold we translate each metric (xG, xS, xG+) into season/team/opponent effects with a simple mixed-effects Poisson, then predict goals out of sample.

Possession-aware xG+ reduces error versus standard xG (including naive sums) and xS alone. Practically, using either the possession “at-least-one” aggregator or a simple max-per-possession proxy beats independent shot summation because it matches how attacks actually unfold.

Table showing Mean Squared Error (MSE) by metric and aggregation method, including xG+, xS, and xG metrics with different possession aggregations.

Does this predict future player performance?

For players, the repeatable edge lives in shot creation. “Shots over expected” (how often a player gets a shot off relative to context) is much more stable year-to-year than goals over expected.

The scatter below compares both shots vs xS and goals vs xG for players in our dataset. Movement to the right (consistent creation) sticks; big vertical pops (finishing more than expected) come and go from one year to the next. By combining creation and conversion, xG+ captures both sides of attacking skill in one number.

Table showing year-to-year correlation of performance metrics: xG is 0.12, xS is 0.63, xG+ is 0.35.

The scatter below compares both shots vs xS and goals vs xG for players in our dataset. Movement to the right (consistent creation) sticks; big vertical pops (finishing more than expected) come and go from one year to the next. By combining creation and conversion, xG+ captures both sides of attacking skill in one number.

Discussion

Bottom line: what happens before the shot captures even more information about goalscoring than what happens after the shot. Furthermore, these estimates are more predictive of both team and individual performance, capturing value that players like Mbappé and Haaland consistently provide. Additionally, aggregating over possessions ensures that dangerous moments that never became shots are credited, while avoiding rebound inflation and maintaining a clean interpretation of goal expectancy over a game.

Limitations & Future Work

Some limits remain (such as tracking noise, a one-second xS window, exposure differences by team), but the pieces are modular and ready for extensions like sequence models for xS, hierarchical player/team effects, and defensive mirrors that value suppression as well as creation.

Scatter plot of xSoe vs. xGoe for English Premier League (2022-2025), highlighting a specific player in red for the 2022-23 season.
Figure 4: Shots over expected (x-axis) vs. goals over expected (y-axis), EPL 2022–25. Creation is the steady signal; finishing is higher variance.

About

Wharton Sports Analytics and Business Initiative Research Notes connect cutting-edge research with practical insights in sports analytics, in real-time.