BvP: When it Matters, When it Doesn’t
Used by almost every DFS player, BvP is one of the most popular stats in Daily Fantasy Baseball. But BvP also is one of the most controversial stats in DFS, because everyone is still arguing whether or not it works.
For those of you who have been living under the proverbial DFS rock, BvP stands for “Batter vs Pitcher”, essentially how well (or poorly) particular hitters have performed against a particular pitcher historically. Everyday, we can look at historical BvP’s using sites like dailybaseballdata.com, and we will find an array of historical hitting data. Some hitters will have very good history, some hitters very bad, and some somewhere in the middle. Because most players understand the luck element of baseball, and because no one has published an extensive study on whether BvP is predictive (at least, not that I’m aware of), we’re stuck with an argument on BvP in the DFS community with two groups on completely opposing sides. Either BvP definitely does not work, or BvP definitely does.
So where do I stand on this? For those of you who like everything to be black and white, you will be a bit disappointed in my answer. I think BvP is helpful, but only if we apply it correctly.
The Multiple Comparsions Problem
The Multiple Comparisons Problem is a concept that anyone who is looking at an large amount of data needs to understand. The problem is this: When looking at a set of data, it is likely you will find some sort of pattern simply due to of randomness. For example, let’s say we flipped a coin 10 times, measured how many times that coin landed heads, recorded it, and then did it again until we had 10 sets of 10 coin flips. Some of those sets will randomly have 7-10 heads in them, while some will randomly have 0-3 heads. So even though the coin is completely fair, some of the data sets will make it look like the coin is biased towards heads or tails, even though we know it’s not.
This concept is crucial when talking about BvP. If we look at a bunch of different players BvP history, it’s likely we will find some players who have performed well against a pitcher and some players who have performed poorly, and the numbers will most likely be completely random. For example, Alex Gordon has a career .652 OPS against Jered Weaver, while Kendrys Morales has a career 1.504 OPS (In the actual game, Gordon doubled while Morales had no hits. Not a very good job by BvP!). In all likelihood, this is as random as 8 of our coin flips landing heads. Simply looking through the data and trying to find good BvP’s won’t be very helpful.
But there is a way to look through BvP data that can significantly improve the accuracy of our analysis.
One way to combat the Multiple Comparison Problem is with Priors. A Prior, simply put, is an idea that we think will influence how well hitters perform. This could be as simple as right-handed batters vs left-handed pitchers, we know right handers do perform better against lefties. It could be more complex, like looking at pitch type data. Or looking at heatmaps and finding hitters who like the part of the zone a pitcher likes to throw it in. Whatever it is, before you even look at BvP, you need to have these in your mind before you look at the data. If, with this knowledge, you find stats that back up your prior, then you can assume (with some confidence) that the BvP is relevant.
But BvP is still quite a bit of luck, so sometimes the data can still fool us. Some stats are more valuable to look at than others.
Walks and Strikeouts
In my opinion, the best stats to look at when analyzing BvP are walks and strikeouts. This is for a couple reasons.
1) It’s the general consensus among the data scientists in the baseball community that walks and strikeouts normalize–just another word meaning “makes sense statistically”– quicker than other stats like HRs or hits. So if we see that a player has had lots of walks and low strikeouts in a small sample, we can be more confident that it’s not just dumb luck.
2) “But walks don’t score us many fantasy points!” Indirectly, they do. Walks are directly correlated to power, because they mean the hitter is getting into favorable counts. Low strikeouts also play a big role in fantasy points, if a hitter is putting the ball in play every time, then he is much more likely to get base hits. Having flyball and line drive data would be helpful as well, but this is the next best thing.
Because of the above, looking at these numbers, especially when sample sizes are small, will be the most helpful.
Another great stat to look at is BvP SBs. Given we have access to rSB data for pitchers easily available to us, we have a great Prior for stolen bases. Furthermore, I think it’s a safe assumption that stolen base attempts don’t have a ton of variance. No one is going to attempt to steal 4 times in 4 opportunities against a pitcher at complete random. So if you find a SB match up you like, make sure to check BvP. More specifically, see how many times they’ve gotten on 1st base against the pitcher (via a walk or single), and then look at how many SBs they’ve attempted (both SBs and CS matter here, we just want our guy to try). If that number is high, you can safely assume our hitter will steal more bases in the future.
One pitfall that we’ve already talked about when looking through BvP is the Multiple Comparisons Problem. Look through a bunch of data with no Prior, and you’ll find players who have done well historically due to randomness.
But you also must understand how to correctly analyze the data with a specific Prior. For example, one Prior I mentioned was right-handed hitters vs left-handed pitchers. This is a well know, statistically proven match up that benefits the hitter significantly. But in the subset of righties vs lefties, there’s also going to be some randomness in the results. We’ll find some righties who crushed lefties in the past, and some who have not done well, and it will still be as random as a coinflip. Furthermore, our average performance is raised in this subset, so we’re going to see some incredible performances that are likely within a reasonable standard deviation of our average. Basically, incredible historical performances can be unreliable if most of your subset is likely to do well.
One last thing, if you use a Prior that’s unproven, sometimes you will find it effective for a few days and think that it must be correct, only for it to fail you for an entire week. Sadly, there’s also randomness in a) finding good BvPs with a specific Prior and b) those BvPs working out.
The bottom line is that BvP can be helpful, but it also can be incredibly confusing, misleading, and unhelpful. Regardless, just like everything in DFS Baseball, BvP is just one small piece of the puzzle of making good picks. The secret to beating DFS Baseball is not held in one stat, it’s held in many.
View all posts by Max J Steinberg