DVOA is a method of evaluating teams, units, or players. It takes every single play during the NFL season and compares each one to a league-average baseline based on situation. DVOA measures not just yardage, but yardage towards a first down: five yards on third-and-4 are worth more than five yards on first-and-10 and much more than five yards on third-and-12. Red zone plays are worth more than other plays. Performance is also adjusted for the quality of the opponent. DVOA is a percentage, so a team with a DVOA of 10.0% is 10 percent better than the average team, and a quarterback with a DVOA of -20.0% is 20 percent worse than the average quarterback. Because DVOA measures scoring, defenses are better when they are negative. For more detail, read below.
Please feel free to contact us with questions and comments about our original statistics using the contact form.
The majority of the ratings featured on FootballOutsiders.com are based on DVOA, or Defense-adjusted Value Over Average. DVOA breaks down every single play of the NFL season to see how much success offensive players achieved in each specific situation compared to the league average in that situation, adjusted for the strength of the opponent.
The NFL determines the best players by adding up all their yards no matter what situations they came in or how many plays it took to get them. Now why would they do that? Football has one objective-to get to the end zone-and two ways to achieve that, by gaining yards and getting first downs. These two goals need to be balanced to determine a player's value or a team's performance.All the yards in the world aren't useful if they all come in eight-yard chunks on third-and-10.
The popularity of fantasy football only exaggerates the problem. Fans have gotten used to judging players based on how much they help fantasy teams win and lose, not how much they help real teams win and lose. But fantasy scoring skews things by counting the yard between the one and the goal line as 61 times more important than all the other yards on the field. Let's say, for example, that Anquan Boldin catches a pass on third-and-15 and goes 50 yards but gets tackled two yards from the goal line, and then Tim Hightower takes the ball on first-and-goal from the two-yard line and plunges in for the score. Or, let's say that the Cardinals are playing the Falcons. The Falcons take a touchback on the opening kickoff, and the Carolina defense stuffs the Falcons running game twice, and on third-and-10 Matt Ryan throws the ball into the arms of Adrian Wilson, who gets taken down by Michael Turner at the two-yard line. Then on the ensuing first-and-goal, Hightower scores a touchdown.
Has Hightower done something special? Not really. When an offense gets the ball on first-and-goal at the two-yard line, they are going to score a touchdown five out of six times. In the first situation, Hightower is getting the credit that primarily belongs to the passing game. In the second situation, Hightower is getting the credit that primarily belongs to the defense.
DVOA does a better job of distributing credit for scoring points and winning games. It uses a value based on both total yards and yards towards a first down, based on work done by Pete Palmer, Bob Carroll, and John Thorn in their seminal book, The Hidden Game of Football. On first down, a play is considered a success if it gains 45 percent of needed yards; on second down, a play needs to gain 60 percent of needed yards; on third or fourth down, only gaining a new first down is considered success.
We then expand upon that basic idea with a more complicated system of "success points." A successful play is worth one point, an unsuccessful play zero points. Extra points are awarded for big plays, gradually increasing to three points for 10 yards, four points for 20 yards, and five points for 40 yards or more. There are fractional points in between. (For example, eight yards on third-and-10 is worth 0.63 "success points.") Losing four yards is -1 point, while losing 12 yards is -1.8 points. Interceptions average -6 points, with an adjustment for the length of the pass and the location of the interception (since an interception tipped at the line is more likely to produce a long return than an interception on a 40-yard pass). A fumble is worth anywhere from -1.70 to -3.98 points depending on how often a fumble in that situation is lost to the defense -- no matter who actually recovers the fumble. Red zone plays are worth 25 percent more for teams (and 10 percent more for players), and there is a bonus given for a touchdown.
(The system is a bit more complex than the one in Hidden Game thanks to a number of improvements since we launched the site in 2003.)
Every single play run in the NFL gets a "success value" based on this system, and then that number gets compared to the average success values of plays in similar situations for all players, adjusted for a number of variables. These include down and distance, field location, time remaining in game, and current scoring lead or deficit. Teams are always compared to one standard, as the team made its own choice whether to pass or rush. However, when it comes to individual players, rushing plays are compared to other rushing plays, passing plays to other passing plays, tight ends get compared to tight ends and wideouts to wideouts.
Imagine two running backs who each gain three yards. Player A gains three yards under a set of circumstances where the average NFL running back gains only two yards (for example, third-and-1), it can be argued that Player A has a certain amount of value above others at his position. Likewise, if Player B gains three yards on a play where, under similar circumstances, an average NFL back would be expected to gain five yards (for example, second-and-15), it can be argued that Player B has negative value relative to others at his position.
Once we have all our adjustments, we can find the difference between this player's success and the expected success of an average running back in the same situation (or between this defense and the average defense in the same situation, etc.). Add up every play by a certain team or player, divide by the total baseline for success in all those situations, and you get VOA, or Value Over Average.
Of course, the biggest variable in football is the fact that each team plays a different schedule. By adjusting each play based on the defense's average success in stopping that type of play over the course of a season, we get DVOA, or Defense-adjusted Value Over Average. Rushing and passing plays are adjusted based on down and location on the field; receiving plays are also adjusted based on how the defense performs against passes to running backs, tight ends, and wide receivers. Defenses are adjusted based on the average success of the offenses they are facing. (Yes, this is still called DVOA, for the sake of simplicity.)
The biggest advantage of DVOA is the ability to break teams and players down to find strengths and weaknesses in a variety of situations. In the aggregate, DVOA may not be quite as accurate as some of the other, similar "power ratings" formulas based on comparing drives rather than individual plays, but, unlike those other ratings, DVOA can be separated not only by player but also by down, or by week, or by distance needed for first down. This can give us a better idea of not just which team is better but why, and what a team has to do in order to improve itself in the future. You will find DVOA used by Football Outsiders in a lot of different ways. Because it takes every single play into account, it can be used to measure a player or a team's performance in any situation. All Minnesota third downs can be compared to how an average team does on third down. JaMarcus Russell or David Garrard can each be compared to how an average quarterback performs in the red zone, or with a lead, or in the second half of the game.
Since it compares each play only to plays with similar circumstances, it gives a more accurate picture of how much better a team really is compared to the league as a whole. The list of top DVOA offenses on third down, for example, is more accurate than the conventional NFL conversion statistic because it takes into account that converting third-and-long is more difficult than converting third-and-short, and that a turnover is worse than an incomplete pass because it doesn't provide the opportunity to move the other team back with a punt on fourth down.
One of the hardest parts of understanding a new statistic is grasping the idea of what numbers represent good performance or bad performance. We try to make that easy with DVOA, because it gets compared to average. Therefore, 0% always represents league-average. A positive DVOA represents that the offense is more likely to score, and a negative DVOA represents that the defense is more likely to stop them. This is why the best offenses have positive DVOA ratings and the best defenses have negative DVOA ratings.
Ratings for teams generally follow that scale, with the best being around 30% and the worst being around -30% (opposite for defense). Players are generally rated between -45% and +45%. However, because the baseline is determined across multiple years of play, no season will average exactly 0%. This gives DVOA the added benefit of being able to show us how the scoring environment has fluctuated from year to year. In 2008, the total league DVOA on offense was 4.8%, the highest season on record and the third straight year where the league DVOA increased. It was below 0% in both 2003 and 2005.
Team DVOA totals combine offense and defense, and the team total is given by offense minus defense to take into account that better defenses are more negative. (Special teams performance is also added, as described below.)
After using DVOA for a few months, we came across a strange phenomenon: well-regarded players, particularly those known for their durability, had DVOA ratings that came out around average. The reason is that DVOA, by virtue of being a percentage or rate statistic, doesn’t take into account the cumulative value of having a player producing at a league-average level over the course of an above-average number of plays. By definition, an average level of performance is better than that provided by half of the league and the ability to maintain that level of performance while carrying a heavy work load is very valuable indeed. In addition, a player who is involved in a high number of plays can draw the defense’s attention away from other parts of the offense, and, if that player is a running back, he can take time off the clock with repeated runs.
Let’s say you have a running back who carries the ball 300 times in a season. What would happen if you were to remove this player from his team’s offense? What would happen to those 300 plays? Those plays don’t disappear with the player, though some might be lost to the defense because of the associated loss of first downs. Rather those plays would have to be distributed among the remaining players in the offense, with the bulk of them being given to a replacement running back. This is where we arrive at the concept of replacement level, borrowed from our partners at Baseball Prospectus. When a player is removed from an offense, he is usually not replaced by a player of similar ability. Nearly every starting player in the NFL is a starter because he is better than the alternative. Those 300 plays will typically be given to a significantly worse player, someone who is the backup because he doesn’t have as much experience and/or talent. A player’s true value can then be measured by the level of performance he provides above that replacement level baseline, totaled over all of his run or pass attempts.
Of course, the real replacement player is different for each team in the NFL. Over the past two years, the second-string running back in Jacksonville (Maurice Jones-Drew) had a much higher DVOA than the first-string back (Fred Taylor). In 2007, Ryan Grant started the year as the fifth-string running back for the Giants and ended the year with a 12.4% DVOA for Green Bay. On other teams, the drop from the starter to the backup can be even greater than the general drop to replacement level. Imagine if Peyton Manning broke his leg, for example. The choice to start an inferior player or to employ a sub-replacement level backup, however, falls to the team, not the starter being evaluated. Thus we generalize replacement level for the league as a whole as the ultimate goal is to evaluate players independent of the quality of their teammates.
Our estimates of replacement level were re-done during the 2008 season and are computed differently for each position. For quarterbacks, we analyzed situations where two or more quarterbacks had played meaningful snaps for a team in the same season, then compared the overall DVOA of the original starters to the overall DVOA of the replacements. We did not include situations where the backup was actually a top prospect waiting his turn on the bench, since a first-round pick is by no means a "replacement-level" player.
At other positions, there is no easy way to separate players into "starters" and "replacements," since unlike at quarterback, being the starter doesn't make you the only guy who gets in the game. Instead, we used a simpler method, ranking players at each position in each season by attempts. The players who made up the final 10 percent of passes or runs were split out as "replacement players" and then compared to the players making up the other 90 percent of plays at that position. This took care of the fact that not every non-starter at running back or wide receiver is a freely available talent. (Think of Jerious Norwood or Devery Henderson, for example.)
As noted earlier, the challenge of any new stat is to present it on a scale that’s meaningful to those attempting to use it. Saying that DeAngelo Williams' runs were worth 84.3 success value points over replacement in 2008 has very little value without a context to tell us if 84.3 is good total or a bad one. Therefore, we translate these success values into a number called "Defense-adjusted Yards Above Replacement, or DYAR. For example, DeAngelo Williams led all running backs in 2008 with 385 rushing DYAR.
(Note: Prior to the 2008 season, DYAR was translated in terms of points rather than yardage, and old articles will refer to these stats as "DPAR" instead.)
Football statistics can't be analyzed in the same way baseball statistics are. After all, there are only 16 games in a season. Baseball has ten times more, and even the NBA offers five times more. The more games, the more events to analyze, and the more events to analyze, the more statistical significance.
That is true, but the trick is to consider each play in an NFL game as a separate event. For example, Eli Manning played only 16 games in 2005, but in those 16 games he had 586 passing plays (including sacks) and 29 rushing plays (including scrambles) for a total of 615 events. Manny Ramirez in 2005 played in 152 games and had 650 plate appearances. For the most part, a quarterback who plays a full season will have almost the same number of plays as a baseball hitter who plays in most of his team's games.
A running back will have fewer plays than a quarterback, and wide receivers and tight ends will have even fewer. But there should still be enough plays with most starting running backs and receivers to allow for analysis with some significance. As an example, LaDanian Tomlinson ran the ball 339 times in 2005, and was the target of 77 pass targets (including incompletes), for a total of 416 plays. In general, a starting running back will have 375-450 plays over 16 games. Receivers are used a bit less, and therefore their stats are likely not as accurate. In general, starting wide receivers have 75-150 pass targets over a full season.
You need to have the entire play-by-play of a season in order to compute it, so it is useless for comparing players of today to players of history. As of this writing, we have processed 15 seasons, 1994-2008.
DVOA is limited by what's included in the official NFL play-by-play, so we can't say which teams have the best offensive DVOA when play-faking, or the best defensive DVOA against three-receiver sets. Since play-by-play lists tackles, sacks, and interceptions, but not attempted tackles, or attempted sacks or interceptions, we don't have individual DVOA or DYAR for defensive players at this point. We're working on these issues with the Football Outsiders game charting project.
DVOA is still far away from the point where we can use it to represent the value of a player separate from the performance of his ten teammates that are also involved in each play. That means that when we say, "Larry Johnson has a DVOA of 27.6%," what we are really saying is "Larry Johnson, playing in the Kansas City offensive system with the Kansas City offensive line blocking for him and Matt Cassel selling the fake when necessary, has a DVOA of 27.6%."
With fewer situations to measure, the numbers spread out a bit more, so you'll see more extreme DVOA ratings for part-time players and for measurements of teams in more specific situations (for example, passing on third downs). The charts listing players in order of DVOA have cut-offs for number of attempts, because players with just a handful of plays end up with absurd VOA and DVOA numbers. (In 2002, for example, Henry Burris had a -103% passing DVOA.)
Passing statistics include sacks as well as fumbles on aborted snaps. Receiving statistics include all passes intended for the receiver in question, including those that are incomplete or intercepted. At some point, we hope to be able to determine just how much impact different receivers have on completes vs. incomplete passes, but various regression analyses make it clear that both quarterback and receiver have an impact on whether a pass is complete or not. The word passes refers to both complete and incomplete pass attempts.
Unless we say otherwise, all references to third down also include the handful of rushing and passing plays that take place on fourth down (primarily fourth-and-1).
The problem with a system based on measuring both yardage and yardage towards a first down, of course, is what to do with plays that don't have the possibility of a first down. Special teams are an important part of football and we needed a way to add that performance to the team DVOA ranking. Our special teams metric includes five separate measurements: field goals (and extra points), net punting, punt returns, net kickoffs, and kick returns.
The foundation of most of these special teams ratings is the concept that each yard line has a different value based on how the likelihood of scoring changes with better field position. In Hidden Game, the authors suggested that the value of field position for the offense existed on a straight line with your own goal line being worth -2 points, the 50-yard line 2 points, and the opposing goal line 6 points. (-2 points isn't just the value of a safety; it also reflects the fact that when you are backed up in your own zone, you are likely going to see your drive stall, and you'll need to punt and give the ball to the other team in good field position. Thus, the defense is more likely to score next.) We use a more refined set of values based on our research, but the idea is the same.
The special teams ratings compare each kick or punt to the league average for based on the point value of field position at the position of each kick, catch, and return. We've determined a league average for how far a kick goes based on the yard line from where the kick occurs (almost always the 30-yard line for kickoffs, variable for punts) and a league average for how far a return goes based on both the yard line where the ball is caught and the distance that it traveled in the air.
The kicking or punting team is rated based on net points compared to average, taking into account both the kick and the return if there is one. Because the average return is always positive, punts that are not returnable (touchbacks, out of bounds, fair catches, and punts downed by the coverage unit) will rate higher than punts of the same distance which are returnable. (This is also true of touchbacks on kickoffs.) There are also separate individual ratings for kickers and punters that are based only on distance and whether the kick is returnable, otherwise assuming an average return in order to judge the kicker separate from the coverage. For the return team, the rating is only based on how many points the return is worth compared to average, based on the location of the catch and the distance the ball traveled in the air. Return teams are not judged on the distance of kicks, nor are they judged on kicks that cannot be returned.
Field goal kicking is measured differently. Measuring kickers by field goal percentage is a bit absurd, as it assumes that all field goals are of equal difficulty. In our metric, each field goal is compared to the average number of points scored on all field goal attempts from that distance. The value of a field goal increases as distance from the goal line increases.
Kickoffs, punts, and field goals are then adjusted based on weather and altitude. It will surprise no one to learn that it is easier to kick the ball in Denver or a dome than it is to kick the ball in Buffalo in December. Because we do not yet have enough data to tailor our adjustments specifically to each stadium, each one is assigned to one of four categories: Cold, Warm, Dome, and Denver/Mexico. An additional adjustment drops the value of field goals in Florida and raises the value of punts in San Francisco.
Once we've totaled how many points above or below average can be attributed to special teams, another formula then transforms these numbers from points to DVOA so the ratings can be added to offense and defense to get total team DVOA.
There are three aspects of special teams that don't show up in our numbers because a team has little or no influence on them -- and yet, these plays do have an impact on wins and losses. The first is the length of kickoffs by the opposing team, because no matter how strong your return man is, you can't make the other guy kick it shorter. The other two are field goals against your team, and punt distance against your team. Research shows no indication that teams can influence the accuracy or strength of field-goal kickers and punters, except for blocks. And although blocked field goals and punts are definitely skillful plays, they are so rare that they have no correlation to how well teams have played in the past or will play in the future. The numbers for kickoff length against, punt length against, and field goals against are added up on the special teams stats pages in the column marked HIDDEN.
Special teams ratings also do not include two-point conversions or onside kick attempts, which like blocks are so infrequent as to be statistically insignificant in judging future performance.
One exception to the use of DVOA/DYAR, and the use of "play success" instead of raw yardage, is the rating system for offensive and defensive lines. Actually, these are only measures of running plays, and of course the defensive numbers don't measure just the defensive line, but the whole front seven against the run.
One of the most difficult goals of statistical analysis in football is somehow isolating how much responsibility for a play lies with each of the 22 men on the field. Nowhere is this as obvious as the running game, where one player runs while up to nine other players -- including wideouts, tight ends, and fullback -- block in different directions. None of the statistics we use for measuring rushing -- yards, touchdowns, yards per carry -- differentiate between the contribution of the running back and the contribution of the offensive line. Neither do our advanced metrics DVOA and DYAR.
We have enough data amassed that we can try to separate the effect that the running back has on a particular play from the effect of the offensive line (and other offensive blockers) and the effect of the defense. A team might have two running backs in its stable: RB A, who averages 3.0 yards per carry, and RB B, who averages 3.5 yards per carry. Who is the better back? Imagine that RB A doesn't just average 3.0 yards per carry, but gets exactly 3 yards on every single carry, while RB B has a highly variable yardage output: sometimes 5 yards, sometimes -2 yards, sometimes 20 yards. The difference in variability between the runners can be exploited to not only determine the difference between the runners, but the effect the offensive line has on every running play.
We know that at some point in every long running play, the running back has gotten past all of his offensive line blocks. From here on, the rest of the play is dependent on the runner's own speed and elusiveness, combined with the speed and tackling ability of the defensive players. If Tiki Barber breaks through the line for 50 yards, avoiding tacklers all the way to the goal line, his offensive line has done a great job -- but they aren't responsible for most of that run. How much are they responsible for?
For each running back carry, we calculated the probability that the back involved would run for the specific yardage on that play, based on that back's average yardage per carry and the variability of their yardage on every play. We also calculated the probability that the offense would get the yardage based on the team's rushing average and variability without the back involved in the play, and the probability that the defense would give up the specific amount of yardage based on its average rushing yards allowed per carry and variability. For example, based on his rushing average and variability, the probability in 2004 that Tiki Barber would have a positive carry was 80% while the probability that Giants would have a positive carry without Barber running was only 73%.
Yardage ends up falling into roughly the following combinations: Losses, 0-4 yards, 5-10 yards, and 11+ yards. In general, the offensive line is 20% more responsible for lost yardage than it is for yardage gained up to four yards, but 50% less responsible for yardage gained from 5-10 yards, and not responsible for yardage past that. Thus, the creation of Adjusted Line Yards.
Adjusted Line Yards take every carry by a running back and apply those percentages. (We don't include carries by receivers, which are usually based on deception rather than straight blocking, or carries by quarterbacks, which are generally busted passing plays except in Atlanta.) Those numbers are then adjusted based on down, distance, and situation as well as opponent (similar to DVOA) and then normalized so that the league average for Adjusted Line Yards per carry is the same as the league average for RB yards per carry (currently, we use 4.08).
Runs are listed by the NFL in seven different directions: left/right end, left/right tackle, left/right guard, and middle. Further research showed no statistically significant difference between how well a team performed on runs listed middle, left guard, and right guard, so we also list runs separated into five different directions. Note that there may not be a statistically significant difference between right tackle and middle/guard either, but until we can research further (and for the sake of symmetry) we do still split out runs behind the right tackle separately.
The system is far from perfect. We don't know when a guard is pulling and when a guard is blocking straight ahead. We know that some runners are just inherently better going up the middle, and some are better going side to side, and we can't measure how much that impacts these numbers. We have no way of knowing the blocking contribution made by fullbacks, tight ends, or wide receivers.
Other numbers we use to measure the running game:
(Note: The Adjusted Line Yards formula was substantially overhauled prior to the 2005 season. Adjusted Line Yards in articles from 2003 and 2004 are based on a different formula and will look smaller.)
The stats section of our website also features drive stats compiled by Jim Armstrong. These stats are computed from NFL Drive Charts and are not adjusted for strength of schedule or situation. Take-a-knee drives at the end of a half are discarded. Drive stats are generally self-explanatory, giving each team's total number of drives as well as average yards per drive, points per drive, touchdowns per drive, punts per drive, and turnovers per drive, interceptions per drive, and fumbles lost per drive. LOS/Drive represents average starting field position (line of scrimmage) per drive from the offensive point of view. Drive stats are given for offense and defense, with NET representing simply offense minus defense.
Our data may differ slightly from official NFL numbers due to discrepancies in different play-by-play reports. In addition, we've adjusted clock plays, with kneels no longer counting as rush attempts and spikes no longer counting as pass attempts. We also count most aborted snaps as passing plays, not rushing plays, unless the play-by-play specifies that the play was an aborted handoff.