Fire Jim Tracy

Tuesday, March 15, 2005


This is why I went to Law School instead of getting a Ph.D. I knew they would make me learn statistics, and regression analysis. This is what we pay expert witnesses for, people! Anyway, my anonymous Ph.D friend, living somewhere in the vicinity of Atlanta, a dedicated UCLA fan who refuses to hold the truth against me, comments on the Derek Lowe piece I referred to below. He didn't really do what I asked him to, which was explain it to you would a child, to quote Galaxy Quest...but he does draw a firm conclusion. It might be true! (A safe stand for a non-tenured academic.) If I had factcheckers and an executive editor, I would forward it on to them, but instead, the blogosphere, I trust, will edit itself, as it always does. So...

The article provides two arguments which both point to the same conclusion: that Derek Lowe’s poor performance in 2004 was largely due to bad luck, and he should rebound in 2005. The convergence of two independent lines of reasoning is supposed to bolster the conclusion’s reliability – a principle which is scientifically sound, and which you pointed to yourself during the whole plagiarism thing a couple of weeks ago. However, the principle only holds if both arguments are themselves valid. After going over it for much longer than I wanted to, my evaluation is that both arguments are suspect, but neither is verifiably specious. Here are the details.

Argument #1: Line Drive %

  1. “Batting average-ball in play” (BABIP) measures batting average for only those plate appearances in which the batter puts the ball in play (i.e., it excludes walks, strike-outs & home runs). Last year, batters facing Lowe had a .327 BABIP
  2. Over the long run, certain types of balls which are put into play (e.g., pop flies, grounders, etc.) are statistically more likely to turn into hits than other types. Not surprisingly, the type of ball which is most likely to turn into a hit is the line drive. Last year, only .172 (17.2%) of the balls put into play against Lowe were line drives.
  3. The difference between these two numbers (BABIP and LD%) for Derek Lowe was very large last year, the fourth largest of any major-league pitcher.
  4. The author claims that this large differences indicates that Lowe was merely unlucky last year, in that batters got a lot of hits off of balls put into play which statistically should not produce hits (i.e., non-line drives). Chances are, next year Lowe will not be unlucky (i.e., have an inordinate number of weakly-hit balls fall for hits), but will be somewhere in the middle of the distribution of pitchers in this category. (Think regression to the mean.) Thus, he should give up fewer hits overall.
  5. Evaluating the argument. It depends heavily on two additional assumptions:
    1. Lowe will have a similar LD% next year as he did this year. The author addresses this issue, and not in a way which supports his thesis: “We really don't know how persistent the ability is among major league pitchers to manage the number of line drives allowed.” He mentions that what evidence there is points in the other direction, but for the most part the question is still open. My own guess (but it’s just a guess) is that pitchers have somewhat constant LD%. The way a ball is put in play depends to a certain extent on the type of pitch seen (e.g., guys who throw sinkers generally produce a lot of ground balls). So, as long as a pitcher’s overall style remains constant from year to year and talent does not decline, LD% should remain relatively stable.
    2. Differences between BABIP and LD% are due to luck, not some other factor which the pitcher has control over. As Jack Ryan once said, “I cannot evaluate this possibility, sir.”

Argument #2: Fielding-Independent Pitching

  1. FIP is a measure of all the things pitchers are directly responsible for – HR, BB, and K, a.k.a the three “pure” outcomes. Put the amount of these things that the pitcher allows/produces into a formula along with IP and you get a number which measures pitchers’ “true” performance and is roughly comparable to ERA It’s an attempt to measure pitching performance independent of the fielding component. Last year, Lowe’s FIP was 4.50
  2. Last year, Lowe’s ERA (5.42) was substantially higher than his FIP, the 6th highest difference of all major-league pitchers. This indicates that a large proportion of the runs Lowe allowed (the 6th largest of all pitchers) resulted from plate appearances which were not one of the three “pure” outcomes. Rather, these runs were the result of balls put into play and thus were not completely Lowe’s fault, but in part the fault of the fielders behind him.
  3. The author seems to be making a regression to the mean argument here, too, though he is less explicit about it. That is, he seems to be claiming that “true performance” is more accurately predicted by FIP than by ERA, that differences between the two are due to luck, and that luck evens out over time. Thus, chances are that Lowe will give up fewer runs next year than he did this year.
  4. Evaluating the Argument. Once again, this argument depends on assumptions
    1. FIP is actually a “truer” measure of performance than ERA. See 5b above.
    2. Lowe will have a similar FIP next year as he did this year. See 5a above. It’s possible that the evidence is better here, but I don’t know if that’s true.
    3. The difference between ERA and FIP is out of the pitchers’ control. My take on FIP is that it measures a combination of two factors, only one of which is out of the pitcher’s control.

i. The quality of the defense behind the pitcher. FIP filters out the effect of fielding on a pitcher’s success and ERA doesn’t, so if it’s doing its job, the difference should reflect the quality of the defense behind the pitcher. Of course, this is out of the pitcher’s control.

ii. The number of opportunities the defense has to affect scoring outcomes. That is, the more often the defense has to make a play, the more its quality (or lack thereof) will help (or hurt) a pitcher. The number of opportunities is itself determined by two factors. First, # of balls put into play; second, the type of balls put in play which in turn affects how the defense responds. Both of these, it seems to me, are influenced dramatically by the pitcher.

The difference between the defense behind Lowe in 2005 will be significantly better than that behind him in 2004. Without going to the stats, my initial impression is that this is true. However, whether it’s going to be enough to significantly improve Lowe’s ERA is anyone’s guess. If it is, it is only because the Dodgers’ 2005 defense is much better than Boston’s 2004, and/or that quality-of-defense plays a very large role in FIP.

My only comment: I understand the bolded sentence, and haven't really seen anybody discuss this point (maybe they have). Regardless of whether Kent or Valentin have iron hands (and please let's not start that again -- at least wait for the first ground ball into right field on Opening Day), I suspect that the 2005 Dodger defense is significantly better than the 2004 Red Sox defense was. Otherwise, you can all fight about it if you want, but leave me out of it.


Post a Comment

<< Home