Author Devin Pleuler is miles ahead of you when it comes to fantasy “football” preparation…
It may come as no surprise that in the Fantasy Football competition hosted on the English Premier League’s website, the more a player is worth doesn’t necessarily mean that that player will produce more fantasy points. There are many reasons for this — some obvious, and some not so obvious. Some of the more obvious reasons are things such as injury, suspension and particular players falling out of favor or being constantly rotated (to name a few). While this article provides some small gains by applying some mathematics to this complex system, these gains can be negated by falling behind on the latest club news or even your fellow participant’s sheer luck. While I believe that the transfers that I have made to my fantasy team are very close to optimal, I am guilty of dropping obvious points for these exact reasons.
With the recent release of the blockbuster Moneyball, based on one of my favorite books of all time, it is fitting that we are taking an intimate look at the game’s statistics and trying to determine what stats are good ones, and which ones are misleading.
Perhaps it makes sense to set a goal for this article, since I do not claim to have the mathematical formula that is the holy grail in terms of fantasy football. Instead, we shall aim to create a statistical model that correlates to a player’s total fantasy points much more closely than the player’s fantasy value.
|Lastname||Team||Fantasy Value||Fantasy Points||Fantasy Points Per Value|
|Van der Vaart||Tottenham||8.9||165||18.5|
*Gerrard only played in 20 EPL games.
As you can see here, just by listing the top 10 valued midfielders from last year’s competition, you see a very wide array of efficiencies, with Florent Malouda leading the top ten with about 19.2 points per fantasy value unit. However, if we proceed way down to Charlie Adam and his 32 points per fantasy unit value, you begin to realize just how inefficient this player market is.
So, lets do our first regression analysis with fantasy points as the dependant variable and fantasy value as the sole independent variable.
The regression creates a line with an y(Points)-intercept of -26.25 with a slope of 1.65 points per unit value. However, the fit line is not very strong, with R-Squared value around 28%. This is the figure that we are going to look to improve upon by finding better statistics to predict a players expected fantasy point production.
As noted earlier with the asterisk next to Gerrard’s name, he only participated in 20 EPL games during the 2010/2011 season. Therefore, with only 88 fantasy points at the price of 11.3, his efficiency took a tremendous hit. However, it is very easy to include minutes played as another independent variable in the regression to compensate for this.
As shown by this graph, minutes played correlates much more closely to total fantasy points than player value, with an R-Squared value of around 80% (already considerably better than value alone). Interestingly enough, no midfielder that played over 3,000 minutes during the 2010/2011 campaign scored less than 106 points (Danny Murphy of Fulham).
When we use both value and minutes as independent variables (and points again as the dependant variable) in a regression (which I will not graph due to the extra dimensionality), we get an R-Squared value in excess of 87% – a significant gain. However, this is misleading because minutes and value probably are not truly independent variables. Value, in this game, is heavily related to popularity. Since a player that plays more minutes is probably more likely to be popular (and 1 to 2 points are awarded to each player for merely participating in the game), it is clear that these variables are dependant to a certain degree. We need to search for variables related to a player that are more independent, and this is a very hard task.
So again thinking in terms of midfielders, we need to come up with some statistics that we feel better represents a midfielder’s ability to create fantasy points. Some obvious stats that we can test are shots taken, passes completed, passes received and pass completion ratio. And, not shown in the table below (and slightly less straight-forward), passes received in the final third, average position when receiving a pass and average position when passing.
|Lastname||Shots||Passes Completed||Passes Received||Completion Ratio||Fantasy Points|
|Van der Vaart||81||1079||985||.77||165|
This regression analysis on the data set gets a considerably better R-value of 91%. The formula of the regression line represents the expected amount of fantasy points given a player’s individual statistics.
|Lastname||Team||Fantasy Value||Fantasy Points||Expected Pts|
|Van der Vaart||Tottenham||8.9||165||154|
It’s clearly an imperfect model (as suggested by both the imperfect R-squared value and the dependence of some of the variables), but it certainly provides a much better guideline for judging a player’s fantasy point production than the players value alone. (Interestingly, if we add the player’s value to this regression, we see less than a 1% improvement on the R-squared value.)
Now, let’s apply this same model to this year’s (2011/2011) data through Gameweek 11 and try to figure out which players are expected to produce the most fantasy points for their respective value.
|Lastname||Team||Fantasy Value||Fantasy Points||Expected Pts||Points Per Value Unit|
However, in gameweek 12, none of these players scored more than 2 points; Petrov only played 22 minutes and O’Hara accumulated the threshold number of yellow cards for a suspension. The search for the most efficient players seems to be much more of an exercise in finding statistical anomalies. But, looking at the amount of actual fantasy points that these particular players have created, gameweek 12 seems to have been a statistical anomaly in itself. Myself, carrying three of these players, still managed to score significantly higher than the league-wide average.
For completeness’ sake, here is the top 5 expected points producers through the first 11 gameweeks.
|Lastname||Team||Fantasy Value||Fantasy Points||Expected Pts.||Ex. Pts. / Value Unit|
A systematic approach to finding these independent variables isn’t within the scope of this article, but machine learning and other methods can be used to find much stronger variables than the variables we hand-picked ourselves. In fantasy football, where participants benefit by luckily picking statistical anomalies and outliers, this kind of expensive variable selection likely wouldn’t yield much of an improvement over hand-picked variables. For real football analysis performed by multi-million dollar clubs and cutting-edge companies, this kind of analysis is indeed worth their time.