MLB DFS: Projecting Hitter Performance Using Statistics
Baseball is a game that can be described incredibly well using statistics. Unlike in the NFL and NBA, where statistics often are very sub par signals of performance, MLB statistics and projections tend to be very accurate signals of future performance. Because DFS for baseball is less intuitive and more statistically rigorous, its important a DFS player understands how to project performance using statistics. In this article I will focus on how to pick hitters using statistics.
Projections vs. Recent Data
A lot of amateur players will get hung up on who the hot guys are, like the guy who has hit 5 home runs in his past 5 games, or someone who is on a 20 game hit streak. But these are actually the guys you don’t want to target, because their price will increase drastically, and will be poor value for their real production.
The reason recent data doesn’t tend to be “real” has to do with a fallacy called “The Texas Sharpshooter Fallacy” or a multiple comparison problem in statistics. What this issue basically says is that when you are comparing a large number of attributes, it becomes incredibly likely that one attribute will become extremely large or small because of variance.
Take this example in wikipedia for instance.
A Swedish study in 1992 tried to determine whether or not power lines caused some kind of poor health effects. The researchers surveyed everyone living within 300 meters of high-voltage power lines over a 25-year period and looked for statistically significant increases in rates of over 800 ailments. The study found that the incidence of childhood leukemia was four times higher among those that lived closest to the power lines, and it spurred calls to action by the Swedish government. The problem with the conclusion, however, was that the number of potential ailments, i.e. over 800, was so large that it created a high probability that at least one ailment would exhibit the appearance of a statistically significant difference by chance alone. Subsequent studies failed to show any links between power lines and childhood leukemia, neither in causation nor even in correlation.
Most people in fantasy do the same thing that Swedish researchers did with ailments near power lines, except we do it with fantasy points across 100s of different players. Think of it like this:
We are looking for who to play today in baseball. We look across 50 different players and find 3 who have just been crushing the past week. Should we choose those players? Absolutely not. Because when you look at 50 different players, it is very likely you will find a couple who have great recent performances from variance alone. Unless we have some prior belief that a player has drastically improved (this can sometimes be the case in NFL and NBA, but rarely so in MLB), its unlikely that the recent performance is a true indication of a player skill. Instead, it’s very likely a streak of good luck.
Because statisticians on websites like Fangraphs and Baseball Prospectus do such a good job at projecting performance, and because recent hot and cold performances are so likely to be caused by variance, I strongly advocate using projections to analyze a players value.
The first thing you want to do is figure out exactly how many fantasy points a player is expected to have per plate appearance accounting for his salary. This should not include RBI’s and runs, as those are better projected using Vegas Over/Under lines, as I will describe later.
It’s important we look at FP per plate appearance because plate appearances are better projected based on someones spot in the lineup. Guidelines for how many plate appearances you can expect for each spot in the order is shown very clearly in this ESPN article.
View all posts by Max J Steinberg