Stealing Signals Week 11, Part 1

Quick data talk, Signal and Noise for TNF and early Sunday games

Nov 22, 2022

∙ Paid

The big thing I’m thinking about this week is data, what we use, and how we use it. I read a good thread on Twitter last week that I agreed with heavily, and it argued against a type of analysis I do enjoy using, the “Over Expected” metrics.

Canzhi @canzhiye

here goes an analytics vibes thread without any actual numbers all of these XYZ Over Expected (CPOE, RYOE, etc) metrics are just methodologically WRONG! how can we attribute the entire residual to a single player?! reading residuals is like astrology for pff nerds lol

Toward the end of that thread, there’s a reference to an earlier thread from July, which features this well-summed tweet.

Robby @greerreNFL

Do you ever think about how Over Expected metrics (CPOE, RYOE, etc) are just model error terms And then what does it mean to assign 100% of the model error to a single omitted variable (QB, RB, etc)

I talk a lot about the fantasy opportunity and efficiency metrics that use the “OE” framework, and it’s not that these can’t be accurate, but it’s that the “expected” part of this equation is so important. That’s a modeled figure that’s supposed to represent what it says — the expectation for a normal situation, controlling for a ton of variables — but what the model spits out in an individual instance needs to be understood. When a RB gets three carries inside the 5-yard line in the same sequence and winds up with more rushing Expected Points than they ever could have possibly scored in that sequence, that’s a tricky element that can’t be handled well (you can cap it, but there’s a point here that it’s also trying to capture the RB’s inefficiency, and that is stacking up in that instance with each successive try).

I obviously like to talk about Pass Rate Over Expected as well, and I’m often talking through the specifics of that figure, because a negative PROE on a very low expected pass rate is fundamentally different than a negative PROE on a very high one. It just can’t be viewed as an identical outcome; the key is to just look at the actual pass rates, and in the case of the negative PROE on a low expected, the actual pass rate is extremely run-heavy, while in the latter case of a high expected, the actual pass rate is somewhere around league average.

The goal of PROE is to take that actual pass rate and add context, and it does, and it’s very valuable. But we can’t just replace our understanding of the actual stat with an OE stat and pretend all factors are created equal. That puts way too much faith in the model. I can’t even begin to count the number of times I’ve seen two OE figures compared that puts an extreme amount of weight on the model itself, e.g. commenting that a team with a 65% pass rate on a 72% expected for a -7 PROE is “more run heavy” than a team with a 45% pass rate on a 50% expected for a -5 PROE. This gets back to the exact point at top where these things are probably directionally accurate — both of these teams are more run heavy than expectation — but trying to compare the nuance of the two figures, -5 and -7, puts an awful lot of faith on the outputs of the model to be exactly correct, which isn’t especially easy when you start to talk about the extreme outcomes.

To be clear, I’m not criticizing these stats, or the EP models the stats are based on, because they are tools, and they are frankly good ones used appropriately. This is where I may slightly disagree with the arguments in the tweet thread referenced above, because they seem to assume the only way these OE stats are used is by applying “all the residual” — which is the nerd way of saying anything that’s not in the model, and could account for a ton of factors we’re not really sure how to capture — to a single factor. So, for example, when I look at PROE in a really windy game, and how weather isn’t in the expected pass rate side of the equation, so we get both teams showing as extremely negative in their run heaviness for reasons that are different than our standard application of PROE figures.

But that’s on the analyst, and my criticism of these stats does come more on the analyst side. I use them quite often, and I believe they do provide helpful context that gives us directional information, meaning the OE figure may not be exactly precise in all instances, but outside extreme circumstances, it’s probably within some error bar of capturing what it’s trying to capture. So a -10 PROE is still relevant in that we can be pretty dang sure that even if there are some serious mitigating circumstances, that was a run-heavy situation. I choose -10 because that’s an extreme rate to say that we can be pretty sure that “true” figure is at least, I don’t know, -5. This is a big reason I don’t give the exact numbers nearly as often when the number is, say, plus or minus 2 from neutral (e.g. I’m not out here saying “Team X had a 1.5 PROE” as much as emphasizing the spots that are well off the average).

Anyway, these stats do have limitations, and I agree with the tweets I mentioned at top that far too often I see commentary that takes OE stats as a fact or known piece of information, seemingly without contextual factors, in a way that displays a fundamental misunderstanding of how these stats are compiled, and how models work more generally. We always need to consider what other residual factors might influence those numbers, just like we need to be discerning with any data we use. I’ve harped on this every way I know how.

One last example before we get to Week 11. From the comments of that second tweet, that individual cites a past example of their own analysis they call a “disaster.” And in that thread, this tweet lives.

Robby @greerreNFL

Subtracting Actual WP from Expected WP yields Win Percentage over Expected

It’s a really great example of the point, in that it’s a really cool idea, articulating whether a QB was able to win a lot of games when the in-game win probability model thought they had little chance, because we love to apply so much of what it means to come from behind to whether the quarterback can lead that charge. But it’s fascinating because while Tom Brady looks like the all-time comeback kid who was far better at generating expected losses into wins than even these other two all-time great quarterbacks, my immediate thought (granted, working off the idea that this individual had already called the analysis a “disaster” so I was looking for something wrong with it) was that Bill Belichick could play a role in this, right?

I mean that’s the old debate, which of those two was more integral to the Patriots’ dynasty, and I think most would definitely put it on Brady at this point. Or somewhere in the middle, which is probably accurate, but setting aside this exact example and Belichick specifically, you’d expect a generic quarterback playing for a coach who understands how to teach the little things, and situational execution, and Football IQ (to the extent that can be taught), to perform better than most in comeback situations. Because to come back in football, you really need to not miss chances. Your defense needs to play situationally, your game management and timeout usage needs to understand game context, you have to understand when and how to dial up the aggressive plays you’re going to need, and the players also have to be good enough to execute, which Brady has always been amazing at, no question. But if Belichick has been a great coach, I’ve always argued his biggest strength is in the details, and how prepared his teams always seem to be to take advantage of any opportunity. It’s hard to believe, but NFL teams miss so many chances each week they just have to be capitalizing on to win some of these games.

For a long time, one of my strongest takes in the Peyton Manning/Brady debates was that Brady had some awesome comeback victories, but they also frequently paired with key defensive stops and other positive outcomes outside Brady’s contributions, whereas it often felt like Manning had to work twice as hard. I mean there’s a massive difference between a QB not leading the key fourth-quarter drive, but then the defense gets a stop, and then the QB succeeds on the second chance, versus scenarios where another QB does lead that successful drive with like five minutes left, but then the defense gives the lead back up and he has to do it again, or maybe doesn’t even get the ball back with enough time to do anything. (And to be clear, I say that because my recollection of my early opinions that I absolutely don’t want to stand behind or argue about is that Brady often experienced situations like the former while Manning experienced situations like the latter more regularly.)

The point isn’t about Brady and Manning but how you can see that there would be other factors that could impact how a QB would look in that Win Percentage Over Expected stat. There’s a ton of context in the residual there, and the most obvious is probably the defense stuff, but coaching and offensive talent around the player all play in. The analytics community often talks about how wins aren’t a QB stat, and while that chart is really nice to look at and might be directionally accurate in a way that is still fun to think about, there’s additional context to consider before we apply all of it to the quarterbacks in question.

One final thing I want to mention from those threads is there is a place that argues that because Completion Percentage Over Expected (CPOE) is stable, it might be more trustworthy, whereas a stat like Rushing Yards Over Expected (RYOE) is not particularly stable, and thus definitely shouldn’t be used as a measure of RB skill. And to that, I actually disagree with the threads again, because I’m not sure the stability of the metrics argues for or against these points. One of my biggest issues with CPOE is how lower-aDOT passers often look really good, which is probably the result of situations like having a guy pretty open underneath and another pretty open downfield, and whether you’re willing to pull the trigger on the downfield pass (like you should because of the potential impact) or just go for small gains (which in many cases represent missed opportunities, even if they are positive plays in a vacuum).

Meanwhile, for RYOE, I’d argue that RB careers are so short, and their performance fluctuates so much based on health — which is extremely fickle at that position, and there are always RBs playing through stuff — that we shouldn’t expect stability over a long timeline. There are so few RBs that are consistently what they are; I always talk about how by the time we know a RB is good, the dropoff is probably right around the corner, and we can’t use past production to predict the future for exactly that reason. There’s also an element of how long runs heavily impact stats like YPC, and they do the same for RYOE. That’s a big reason success rate is used in certain contexts, and even the combination of the two doesn’t always capture what a RB truly is, because again it’s maybe never stable, and it’s just very difficult to isolate vis a vis blocking and all of the ways rushing yards are compiled.

But one of the reasons I do think RYOE is worth looking at is as I understand it the model was built using non-RB information because using RB information did not improve the predictiveness of the model (contrary to a note in the first thread linked above, though I may be the misinformed party here). But that is to say that the most important factors to the model were things like players in the box, offensive line information (skill, blocking win rate, etc.), and then the positional configuration of the players during the play, which (if accurate) would mean this OE stat is one where a pretty large percentage of the residual really can be chalked up to one main factor, the RB. That’s really the point here, is how much of the residual really does come down to the variable we’re trying to measure with these OE stats, and again, I actually thought that was a point in favor of RYOE compared to some of the others that are more hastily thrown together.

Anyway, I don’t think when you really peel back the curtain, that CPOE and RYOE are that much different in terms of having context that needs to be applied (not in every single case, but in more cases than you’d like to see), and also in still providing some directional information. For CPOE, I would expect that information to have a longer timeline of usefulness, while for RYOE I think it can be used to articulate how a RB is performing at a given moment, with the understanding it can change quickly, and we shouldn’t expect it to be particularly predictive of itself or of future rushing performance on a longer timeline.

To give an anecdotal example of how I think this can be applied, James Robinson had the two early long TD runs, I argued in Stealing Signals he did not look like a player who would keep up the elevated efficiency, and I actually used RYOE at one point but crucially for Robinson I avoided the cumulative stat which was heavily influenced by those two plays in a to-that-point small sample, and instead looked at percentage of runs that went over expectation, which was a stat that did back up that he wasn’t looking great (he was very low in this “consistency” metric). It’s not just that two plays made him look very efficient; it’s that those two plays had such an outsized impact on the sample but we know the long runs are not stable, and I was happy to make a further argument that I was betting against him continuing to hit on plays like that. Given those nuanced arguments, the RYOE data did I believe present support that if he didn’t keep hitting stuff like that, his rushing efficiency would probably actually be negative moving forward, and we saw that play out, but obviously one outcome doesn’t prove anything, ever. (And also Robinson could get healthier and play better, which I’ve discussed as a possibility recently. It’s just an example of my thought process in using that stat.)

Again, it’s all in the application of the stats and data, and some of that requires some domain knowledge and understanding of how the stats are compiled. That’s just football, as far as I see it, and it’s why I write this column every week, though it’s not meant to argue I don’t make mistakes either, because of course I do. But so much time is spent arguing the finer points of these metrics when most of the metrics do have their purposes when appreciated for what they are and are measuring. Most can be misapplied pretty easily, too, with too much weight put on takeaways that probably shouldn’t be carrying it. This is true of all sorts of stats that are heavily referenced — EPA is a great one to reference here, because it simply looks at the result of the play, with no other context, and that simplicity is very refreshing in some respects, but it should then include context for highly-variable plays that were particularly impactful to a sample. It’s an incredibly unsatisfying conclusion, of course, but it is what it is, and more people seem to ignore these finer points than respect them.

Let’s get to the Week 11 games, which were mostly not good for fantasy. Data is typically courtesy of NFL fastR via the awesome Sam Hoppen, but I also pull from RotoViz apps, Pro Football Reference, PFF, RotoGrinders, Add More Funds, and I get my PROE numbers from the great Michael Leone of Establish The Run. Part 1 of Week 1 included a glossary of important statistics to know for Stealing Signals.

Titans 27, Packers 17

WR Snap Notes: Treylon Burks: 50% (-6 vs. Week 10 return), Christian Watson: 80% (-4 vs. W10 high), Randall Cobb: 56% (return)

Key Stat: Treylon Burks — 0.71 WOPR

The Titans went into Lambeau on a short week in a cold night game and pretty easily won, scoring a touchdown on the game’s first possession and never relinquishing that lead. They led by double digits in more of the second half than time spent with the game at a closer scoreline, including the final 14:55 of the fourth quarter. And they were frankly just the better team.
Derrick Henry (28-87-1, 2-2-45) had a really nice nice, while Dontrell Hilliard (1-4, 1-1-14-1) caught a touchdown late but was mostly uninvolved in the run-heavy gameplan.
Treylon Burks (8-7-111) was the big story for Tennessee, hauling in a 43-yard deep shot on the game’s third play, then later hauling in a 51-yard pass with about two minutes remaining when the Titans more or less could have kept the ball on the ground to try to run the game out. Outside those two deep plays, he had just 17 receiving yards on five receptions, but the situations made it pretty clear he was a focal point they wanted to get going, both with the early designed shot play and especially the late one, which more or less read like a, “Go make a play, young fella” confidence-building type moment. Burks’ 0.71 WOPR was a season high by a good margin, and was the second-highest figure for any Titan, behind Robert Woods’ Week 3. This is not typically an offense that concentrates volume, but Burks notably got to this mark while running routes on just 66% of dropbacks. Woods (7-6-69) and Nick Westbrook-Ikhine (2-2-28) were both at 84%, but I’d expect Burks to climb in the coming weeks, which elevates his potential. He’s a startable option as the focal point of their downfield passing game, though the floor in the passing game is going to be low at times.
Austin Hooper (4-4-36-2) caught two touchdowns, but he ran routes on just 44% of dropbacks. That’s low for his typical rates, but he hasn’t hit 75% in any game this year as Tennessee employs a TE rotation. Chigoziem Okonkwo (2-1-31) was right with him at 44%, and Geoff Swaim (1-1-3) was at 31%, so don’t expect Hooper’s production to lead to more big fantasy days given the offense and his lack of a hold on the TE routes.
Burks’ fellow rookie Christian Watson (6-4-48-2) scored twice more in this one, including on a free play where the defense jumped offside and Aaron Rodgers threw up a 50/50 ball that the DB made a poor effort on, and Watson did well to win in the air. His two touchdowns were his only catches of the game until two on the final drive, down 10 with under five minutes to play, with the defense seemingly playing a little softer. Allen Lazard (11-5-57) and Randall Cobb (6-6-73) were more consistently involved earlier, and I more or less stand by what I wrote on Watson last week that I don’t really expect him to be a phenom. He now has five touchdowns on just eight catches over the past two weeks, and obviously touchdowns are good, but without them his receiving lines the past two weeks wouldn’t even draw a huge second look. I’m also still not sure much is actionable about that; it’s more just a point about his longterm ceiling, and if you have him, you ride the wave unless you get some fantastic trade offers.
Aaron Jones (12-40, 7-6-20) continued to run well ahead of AJ Dillon (6-13, 1-1-10), and Jones’ routes were back to 56% of dropbacks, which is in line with his typical rates (he was down a bit in Week 10 after leaving Week 9 early).

Signal: Treylon Burks — 0.71 WOPR on just 66% routes, designed shot plays early and late that signify the team seems him as a focal point
Noise: Austin Hooper — 2 TDs (only 44% routes, splits with two other TEs, can’t be trusted in this passing game); Christian Watson — 5 TDs on just 8 receptions over past two weeks, has not been nearly as dominant as the TDs suggest (while still obviously being good)

Stealing Signals Week 11, Part 1

Quick data talk, Signal and Noise for TNF and early Sunday games

Titans 27, Packers 17

WR Snap Notes: Treylon Burks: 50% (-6 vs. Week 10 return), Christian Watson: 80% (-4 vs. W10 high), Randall Cobb: 56% (return)

Falcons 27, Bears 24

This post is for paid subscribers