Skip to content

Compromised numbers: Why the statistic you see may not be actual possession

Apr 23, 2012, 1:36 PM EDT

Barcelona's midfielder Xavi Hernandez re Getty Images

One of the amazing statistics to come out of last Wednesday’s UEFA Champions League match was the possession number. Barcelona was reported by UEFA was having held the ball 72 percent of the time, an amazing figure against a club of Chelsea’s caliber. For those who have tried to find significance to correlations between possession and victories, the number must have been both remarkable and beguiling. After all, Barcelona lost, giving more credence to the hypothesis’ main qualm: What if one team doesn’t care about holding the ball?

The next day, the possession story got even more confusing. Supreme stat overlords Opta reported that Chelsea had only managed 20 percent of the ball. What? Even less time in possession? How freakish is this data point going to get?

That, however, is not the story. At least, it’s the story in light of what Graham MacAree notes at Chelsea fan site We Ain’t Got No History. As he’s found out, Opta seems to be miscalculating possession; or, better put, Opta is not reporting a number consistent with the normal expectation for a possession stat.

The normal expectation: When one team has the ball, they’re in possession. I think we can all agree on this, right? This still leaves a lot of gray area. For example, who gets credit for possession when midfield chaos leaves neither side in control? Does one team get possession on a goal kick, when most goal kicks lead to 50-50 midfield challenges? And more broadly, what happens when play is dead but the game clock is running?

I’ve always assumed this is like a chess clock. When one team controls the ball, you hit a button that sends their dials turning. When the other fully regains possession, you hit a button. One clock stops. The other starts running. Those in between moments? They’re governed by one rule: Until possession changes, don’t touch anything.

That, apparently has nothing to do with Opta’s calculations. In fact, Graham’s research suggests Opta doesn’t even run a clock, which may be why they never report possession in terms of time. Instead, the relation between reported possession and total passes suggests Opta just uses passes. As Graham found out, if you take a team’s pass attempts a divide it by the game’s total attempted passes, you have Opta’s possession stat.

What does this mean? Let’s take a totally fake scenario. Barcelona plays three quick passes before trying a through ball that rolls to Petr Cech. It all takes four seconds, while Petr Cech keeps the ball at his feet for eight seconds before picking it up, holding it for five seconds, then putting it out for a throw in, which takes eight more seconds to put back into play.

Despite Barcelona having possession for only four of those 25 fake seconds, they’d have 80 percent of Opta’s possession (three good passes plus one bad, while Chelsea had only Cech’s unsuccessful pass). A logical expectation of a zero-sum possession figure would have that as either 16 percent or (if you credit the time out of play as Barça’s, since they’d have the ensuing throw) 48 percent Barcelona’s. Or, if you do a three-stage model (that’s sometimes reported in Serie A matches), you’d have 16 percent Barcelona, 52 percent Chelsea, and 32 percent limbo/irrelevant.

Of the three methods of reporting possession, Opta’s bares the least resemblance to reality; or, it’s the one that deviates furthest from what we expect from a possession stat.

Ironies being a thing these days, there are two here. First, Opta is the unquestioned leader in soccer data management. How could this happen?

Second, Opta isn’t trying to hide their methods. In fact, they’ve published a post on their site detailing not only their practices but their motivations and research, an investigation that found their approach “came up with exactly the same figures (as time-based methods) on almost every occasion.”

You would think two curmudgeons like Graham and myself would have found this, right? Graham had a reader point it out to him, while a representative from Opta magnanimously pointed me to the piece without the seemingly necessarily indignation of explaining how a Google search works. After all Graham’s work and head scratching – after my lack of work and similar head-scratching – we could have just gone to Opta’s site.

“We try to be as transparent as possible with this stuff,” Opta said when I asked them about it. Certainly, they should be commended being so up front about their methods. After all, they’re a business that makes money off their work. They don’t need to give away their secrets.

But that’s a secondary issue. The main one: Why is a data house like Opta, reputed as the industry standard, taking this short cut? Or, why haven’t they renamed their measure? Granted, the perception that it is a shortcut may have more to do with our expectations than their intent, though based on their defense in the post, it’s clear they do see this as an accurate way of describing possession.

Still, the number they publish is completely redundant to the raw passing numbers also distributed. Why put the measure out at all if not to check a “possession stat” box on a list of deliverables?

Opta’s possession stat shouldn’t be cited in reporting, and if it is, the word “possession” shouldn’t be used to describe it. Reader expectations for anything labeled “possession” are drastically different than what Opta’s producing. The number is confusing to the point of being misleading. It’s becoming counter-information because of its poor packaging.

Even though Opta’s post on the topic is 14 months old, most will be surprised to hear this “news.” It’s disconcerting for anybody who is hoping a SABR-esque revolution’s on the horizon. Almost all of the huge volume of data to which we have access has been useful, but where people are expecting something akin to linear weights to be published tomorrow, we can’t even agree on the terms (let alone the significance of them).

Graham probably puts it better:

I’m completely fine with keeping track of passing volume – I’ve done it before myself. What’s frustrating, from an analyst’s point of view, is that we’re being sold a dud. A statistic that ostensibly measures possession measures something that is not possession, and gets repeated as authoritative anyway.

And people wonder why football statistics don’t get taken very seriously.

  1. moonty - Apr 23, 2012 at 3:08 PM

    I’d be interested to see how Opta possession stats compare, match-by-match, to a clock-based possession stat. If, like they claim, the difference is, at best, negligible, then I see no particular problem using the Opta stat. It may not seem particularly intuitive, of course, but it does seem to account for some of the grey area properly. (Grey area, to my mind, being as you discussed: Moments where possession is at best vague.)

    It would particularly interesting to see how those numbers compare (across an entire match) across different ‘types’ of matches — the Chelsea-Barcelona example, for one, may produce more outliers than, say, Aston Villa-West Brom or some such.

    Maybe the bigger problem is the importance placed on the possession stat on its own. It’s not particularly meaningful when placed next to passing, passing zones, pass directions, pass distance, distance run — all of which, I think, say much more about a match than a simple ’60% possession’ stat does. It’s not as easy to pop up on a screen, of course, but it tells more of the story.

    And perhaps people have trouble taking football statistics seriously because only one statistic dictates a result: Goals. Of course, we all know it’s more than that, but I would hesitate to blame the possession stat for that.

  2. Ed Farnsworth - Apr 23, 2012 at 3:39 PM

    Putting aside the larger issue of how to best record possession, when it comes to passing numbers, two different sets are regularly reported at MLSsoccer.com’s MatchCenter. For example, the total number of passes recorded for Chivas USA and the Philadelphia Union in last Saturday’s match is reported on the MatchCenter Stat page is 536 and 274, respectively (66.2 and 33.8 percent “possession”). But when you look at the Opta powered Chalkboard page, the numbers become 592 and 308, respectively (65.8 and 34.2 percent). It’s a small difference in terms of percentages but a large difference in terms of the number of passes (some 90 passes) and, anyway, why is there any difference at all? So far as which one is to be considered more reliable, your guess is as good as mine. At least the passes can be visually tallied on the Chalkboard if you’re actually mad enough to count all of those little circles.

    • joeyt360 - Apr 24, 2012 at 11:27 AM

      Thanks for letting me know that. I tried it myself, figuring that there was probably something Chalkboard was double-counting, but none of the categories lead to that exact discrepancy of 56 (I got one that equaled 54 and seemed to make sense).

      I like using the chalkboards because they allow me to use the two ‘adjustments’ I usually apply to the possession stat. I supplement it by:
      * counting the passes in the opponent half of the field, and
      * the overall (whole-field) passes only for the part of the game when the score was level.

      I believe these are two things that tend to bedevil the possession stat (overly conservative possession vs attacking, and the natural tendency of teams that already have a lead to step back a bit) and weaken its correlation to victory. There are probably others I haven’t thought of yet that one could use the Opta chalkboards to look into (I have tried ‘final third’ and am happy with the results except for not having an agreed-upon definition of that, where the half line is easy).

  3. wesbadia - Apr 23, 2012 at 4:11 PM

    I’ll preface my comment by saying that I am a passionate believer that possession either directly or indirectly leads to more victories in this sport.

    That being said, I initially thought this was another hit piece on the possession game in the same vein as MLSsoccer.com’s Central Winger articles have been the past few weeks. It seems that it’s the fad to try to demolish this statistic nowadays. But I was both relieved as well as disturbed by what this article was actually about. It’s not a hit piece on possession, but instead a glaring investigative report into the biggest soccer analytics database in the world.

    If Opta is conducting its data collection in this way, I see no reason to have confidence in the quality of analysis they are giving. Because possession IS such a key aspect of this sport, seemingly cutting corners by doubling up on a data collection method is unacceptable given the reputation they have.

    It’s my belief that possession should be recorded and tabulated in as realistic a format as possible. That doesn’t mean a chess clock (even though that’s how I commonly picture it, too). It means having no less than three categories (Team 1, Team 2, dead ball). It means providing time stats along with percentage stats. It means aligning a team’s possession stats in a way that corresponds to events during a game (ie, goal, free kick, PK attempt, throw-in, injury, other stoppages, etc). The ONLY way to provide true possession statistics is giving a real-time feel to the category and splitting the numbers up in meaningful ways. Giving us a double stat for passing volume is cheap and lazy, and it degrades the analytical soccer fan by trying to pull the fleece over our eyes. As an analytical soccer fan, I demand better quality content.

    Thanks for bringing this up, Mr. Farley. I’ll be propagating this article as much as possible.

  4. baseballbarrister - Apr 24, 2012 at 1:25 AM

    Great article, this is fascinating!

    I also naively assumed the possession statistic was time of possession. My only guess as to why they use this method, is that the man power required to track time that closely might make it impractical

  5. hjworton46 - Apr 24, 2012 at 10:57 AM

    Ridiculous. Stats mean nothing in soccer, apart from who has most goals at the end of the game.

  6. banouby - Apr 24, 2012 at 11:21 AM

    Hi – I left the following comment on the Chelsea blog, thought it may be useful to repost it here (obviously it references comments on the other blog):

    Hi Everyone,

    Been following this discussion with interest and I thought I would comment.

    Firstly, at Opta we appreciate the feedback and encourage the discussion. The inference that we were in some way “hiding” our methods is incorrect – the post explaining our approach to possession has been available on our website for over a year. Google “Opta and Possession” and it is the top link. Or at least it was…..

    One of the interesting things about the comments above is that there was a bit of a side discussion kicking off about exactly when a team was in possession. Is a team in possession when the ball is in the air? Or out of play? Or when someone is injured?

    This discussion shows pretty clearly there are differing opinions, and with this comes a level of subjectivity. This is something we try to avoid, given that we are collecting and supplying data from thousands of matches every year.

    One of the main issues we’ve found with the chess clock approach is that it is very, very difficult to implement without putting a dedicated man on it – this is simply unrealistic. Don’t be fooled into thinking that the “chess clock” figures you see quoted are accurate – it is likely that they are hugely flawed. We’ve tried it.

    To suggest we don’t think about this stuff, or care deeply about the accuracy about our output is so wrong. You’d know that within seconds of walking into our office. It’s all we think about. Without the accuracy, and the passion for getting it right time and time again, it is unlikely that the world’s biggest broadcasters, betting companies, publishers, football clubs and governing bodies would work with us.

    It is clear that some people don’t agree with how we define this particular stat. That is absolutely fair enough. But as we mention in our website post, we’ve tested it against the other method and the results are very, very similar. And the way we do it ensures we are consistent across matches and across leagues.

    If you are interested in any of our other football definitions, you can see our glossary here – http://www.optasports.com/about/news/feature-opta’s-football-action-definitions.html

    Anyway, I guess I’m glad you all care so much about stats to post.

    All the best
    Simon Banoub

    • wesbadia - Apr 24, 2012 at 11:59 AM

      That’s all well and good, but the issue is not whether OPTA is “hiding” anything; it’s that OPTA is telling us that a stat which is commonly interpreted as one thing is not equal to what is actually being collected. You are not collecting “possession”, you are collecting passing volume as it correlates to either team. And while there may be some apparent correlation in the numbers you get from both “methods”, the fact remains that the true definition of possession is NOT passing volume.

      In any other profession, making this glaring change would be unacceptable. In my industry (transportation), if I were to substitute one piece of my traffic data collection for another piece simply because the two numbers were “very, very similar”, my clients would question it infinitely, and my employer would be asking me what the hell I was doing. I don’t know how else to get my point across besides saying you’re comparing apples and oranges… or, rather, apples and pears — in the same family, but not exactly the “same”.

      As the leading collector of sports statistics, I’d expect that OPTA would be striving to maintain that reputation by providing the more accurate and the clearest data they can. Using an excuse such as “impracticality” is unacceptable. Again, in my profession, I’ve needed to do many impractical things in order to collect the data necessary to provide meaningful and precise figures. I’ve sat at a single intersection days on end counting every single turning movement of every single vehicle, just so we know how traffic flow is actually happening. If that’s not impractical, I don’t know what is. Look at it this way: at least your employees that are collecting these data are able to watch a professional sports event during their task. Crowd source the task if need-be. Hell, if you’re hard-pressed and need someone to collect data for you, give me a call and I’d gladly do it. But calling a pear an apple is NOT the way to go about things.

      • moonty - Apr 25, 2012 at 1:25 AM

        Perhaps the issue is that expectations for possession are wide of the mark. It is, at best, an ambiguous statistic even when using a strictly timed method — this much is clear.

        But I struggle to see why this is so unacceptable. If another statistic-collecting group wants to come about and compile stats with a clocked possession statistic at an undoubtedly higher cost, then I don’t particularly see what the problem is. Transportation? Sure, it’s very important that you collect even impractical statistics. People’s lives may depend on that stuff. Football statistics, though? They’re less vital, to say the least.

        But if it’s a question of having fine-grained passing statistics including direction and distance, or having time-clocked possession statistic, I’d take the former any day of the week. If they decide they can afford to add a dedicated person for timing, then, well, good for them. An extra 50-100% in personnel cost would certainly be passed on to the purchasers of said data, and we’ve got a problem there, too.

        And here’s the thing: There would seem to be a very, very strong correlation between total passes and possession time. If they’ve got an explanatory mechanism that follows logically, we should probably accept that, shouldn’t we?

        If they find a good way to optimize it, then I’d welcome it. But I’d hardly call this unacceptable.

      • wesbadia - Apr 25, 2012 at 9:26 AM

        I think you fail to see the direction at which I’m approaching this issue.

        A further glimpse into my career: even though my clients and my employer expect me and my co-workers to deliver accurate data, the industry and the safety of people does not depend on a numerical figure that is slightly off of reality. If my “percent trucks” number is off by 2-3% (even 5-10%) due to me calculating it in an unapproved method, the result will NOT be a drastically changed design; especially if that percentage is already high to start.

        But my client and my employer still expect me to deliver the best possible product. Why? Because that’s what’s demanded of us, and our (and my) reputation depends upon it.

        And here’s the crux of the issue for me: OPTA is delivering a product that is sub-par given what the consumers of said product is demanding. It’s apparent that the majority of soccer fans around the world want their statistics to be as accurate and as truthful as possible. People want in-depth analysis, and that depends upon the relevant data being at an acceptable level. What OPTA is delivering is NOT acceptable to many people… and their reputation and their livelihoods depends upon meeting the demand of their constituents.

        The trend towards highly-analytical soccer reporting is gaining momentum, even with the casual fan, and especially in the US. I would assume the European fans are the same, judging by the way the Chelsea boards are heating up. And this is proof that the “world leader is soccer statistics” is not providing us fans with the best possible product. When overwhelming numbers of MLS fans are shelling out $60 for a subscription to MLSLIVE every season to gain access to not only games but statistical analysis, it tells me that there is a high demand for what OPTA is providing. Accompanying that high-demand is an expectation of accurate, precise, and truthful reporting.

        Contrary to what you suggest, possession is not some abstract term encompassing a broad spectrum of reality. It’s a stat that can be compiled very accurately. The Italian Serie A has been doing a three-tier method for possession for as long as I can remember (as the article says). Adding another other tiers of possession for things like “dead ball”, “injury”, “contested possession”, etc would further define what possession is… other than just passing volume. And this is how OPTA can economize putting personnel on this task. It is far more profitable to have that person collect as much data as possible about possession instead of JUST hitting the “chess clock”. Stretching the amount of data being collected should make it worth collecting that data because you’re providing even MORE information in the same amount of time. I fail to see how optimizing the collection methods in this way is “impractical”.

Leave Comment

You must be logged in to leave a comment. Not a member? Register now!

Featured video

FIRED: Where David Moyes went wrong