Things I learned at JSM 2016

Ed. note – I’m back, and hopefully for a good long while.

This year I found myself with 5 major conventions I was interested in going on within a 2 week span, covering my varied interests in baseball, stats, gaming, and religion. Alas, I could only afford to go to one, so I took advantage of the fact that the Joint Statistical Meetings were being held in Chicago, thus eliminating my need to pay for airfare.

JSM is the largest single gathering of statisticians in the world. The American Statistical Association, the prime organizer among the half dozen statistical societies that co-sponsor the event, always books the host city’s signature convention center in order to hold 6,000 attendees over the course of 6 days. This was only my second time attend, having previously attended the 2008 conference in Denver.

The conference has many different subjects being analyzed, from finance and risk to data modeling and visualization. But, with this being a blog that focuses on baseball, I’ll share 4 things I learned (and 1 thing I already knew) from the sports sessions.

1) The gap between academics and practitioners in statistics is narrowing
The ASA has a well-deserved reputation as an organization that favors academic pursuits in statistics. Its membership is predominantly employed in academia, and membership growth has not kept pace with the growth of the profession. Yet, it seems the explosion of data has led to more cross-over. One paper that was presented listed a FiveThirtyEight writer among its authors.

2) There are more ways to get into baseball than by studying it directly.
One of the presenters I was able to interact with has spent a lot of time looking at SportVu, the NBA’s player tracking data. This person is now slated to start working for a baseball team in the fall because of that work. You can probably guess that the position will focus on understanding Statcast data.

3) Sports teams aren’t looking for subject matter experts with stats knowledge; they want stats experts with subject matter knowledge.
At a panel discussion about stats in sports, both panelists that are currently employed by major sports teams noted that front offices are loaded with SMEs. They want people with stats backgrounds who can analyze the data well. The odds of anyone getting into a front office today in the same manner as Bill James is highly unlikely.

4) It pays to think about analogs from other fields.
Dan Cervone presented a model trying to value court space in the NBA. He built his model analogous to how real estate valuations are made. It’s the type of thinking that leads me to pay attention to JSM, and it also is, in my opinion, a requirement to getting the most out of the conference.

5) Ideas at JSM are starting points, not end points.
The fact that JSM does tend to emphasis statistical methods over results means many papers aren’t necessarily providing new insights into well studied issues, but new ways to analyzing the questions at the heart of the issue. Take this presentation on predicting outcomes of plate appearances. It uses a type of regression modeling designed to handle structured outcomes like in baseball. It may not provide any new insights into how baseball is played, but an idea like this could end up in the next great forecasting system.

Sabermetrically Gaming: Bottom of the 9th

In the first entry of this series, I looked at some of the math behind the most popular tabletop simulation of real life baseball, Strat-O-Matic Baseball. Today, I’m going to look at a new and simpler tabletop game, Bottom of the 9th.

The premise of the game is pretty simple in baseball terms. The game is tied. The home team is up to bat. The inning should be evident from the name of the game. The talent gap is wide between the two teams, and so favors the visitors that it is considered a miracle the home team even has a chance to win. It is also presumed the visitors will win should the game go to extras.

The game plays out the inning pitch-by-pitch. Each pitch is a turn in the game, and is resolved with 4 or 5 steps. The heart of the game is in the Stare-Down, where the pitcher picks an area of the zone to throw to while the batter tries to guess where the pitch is going. This is simplified into 2 choices: a red disc for height of the pitch (High or Low)  and a white disc for which half of the plate (Inside or Away). The batter gets some benefits for guessing right, while the pitcher gets some benefits when the batter guesses wrong.

After this, the pitcher makes The Pitch by rolling 2 dice, one which determines whether the pitch is outside the zone, inside the zone, or paints the corner, and a standard six-sided die for control, where higher numbers are better. This impacts the swing, where the batter rolls one standard six-sided die. The benefits granted to each player from the Stare-Down are applied here before comparing the results of the swing and control numbers.

The die rolls are the simple part to break down mathematically. The pitch dies shows a pitch outside the zone to occur 1/2 of the time, inside the zone 1/3 or the time, and on the corner 1/6 of the time. A ball is called when the Swing result is less than or equal to the Control result. The ball is put in play when the Swing result equals the Control result for a pitch on the corner or when the Swing result is greater than the Control result for a pitch in the zone. All other combinations result in strikes.Based on these dice alone, a ball in play occurs on 16.6% of pitches, a ball is occurs on 29.2% of pitches, and the strike occurs 54.2% of the time. 

How does this compare to actual major league rates? It’s a little off. In the 9th inning of games in 2015 (via Baseball Savant), balls occurred on 34.8% of pitches and balls in play occurred on 17.4%of pitches. This is because of the simplified system used in Bottom of the 9th, which values speed of play over simulated accuracy. 

Another example of how the game values speed is what happens on a Contact result. The first player to roll a 5 or 6 tips the result in their favor. This requirement can be modified by the player at bat or pitching, and it depends on speed of rolling a die as well as the players involved.

Clearly, this is not a real simulation. That’s not a bad thing. Bottom of the 9th was designed to appeal to board game fans as well as baseball fans. It represents two of the hotter trends in board gaming: it was funded via Kickstarter, and it’s considered a “micro-game”, a simple game that can be played in around 15-30 minutes. It’s a pretty good game, and definitely recreates the feel of the batter-pitcher confrontation. I picked up a copy from my local game store, but I now wish I had funded the Kickstarter.

The SABR 101 Project

One of the things I missed when I had to skip out of SABR 45 Saturday was the committee meeting for SABR’s largest research committee, Statistical Analysis. Unlike many of the other committees, the Stat Analysis committee didn’t have a group project to work on, in part due to the individual nature of most members’ research. A couple ideas were bandied about the meeting during SABR 44, but it took until SABR 45 to get one of those ideas off the ground.

A few weeks back, Phil Birnbaum, the chair of the committee and editor of the By the Numbers newsletter, announced that group project. The idea is to create a crowd-sourced list of key resources for helping newcomers to sabermetrics learn what has been done and provide to him or her the foundation for additional contributions.

There are plenty of books and articles which I could cite, so I’m going to start with the broad resources that cover multiple topics. That means it does skew towards books. They are listed in the order they came out of my head.

Before I get into my long list, I want to invite you, dear reader, to contribute your recommendations to this project. If you do so in the comments, I’ll be sure to pass them on.

  1. The Numbers Game, by Alan Schwarz. This book came up recently when Graham Womack of Baseball Past & Present and I discussed the importance of this book and a few other titles that will make there way onto this list as for which one we’d recommend first. We both agreed that this title is where we’d tell others to start. A fantastic history of baseball’s numbers, and the understanding of how a particular stat like batting average or OBP came to be is key to understanding any analysis with those measures.
  2. The Hidden Game of Baseball, by John Thorn and Pete Palmer. It’s over 30 years old, and it might be the most important book in sabermetric history. There’s a reason I started my sabermetric research database project with this book: it was The Numbers Game before Schwarz wrote his book with its concise history of baseball statistics AND it introduced the linear weights model to the world, which is much more of the mathematical foundation of modern sabermetrics than anything put out by the most famous name in the field.
  3. The Bill James Abstracts, both the annuals printed from 1977-1988 and the Historical Abstract (first published in 1986, revised and updated in 2001). For the many who grew up before Al Gore’s invention came to the masses, these books were how they were introduced to sabermetrics. Bill isn’t a statistician in the academic sense, but his understanding of baseball endows his analyses with tremendous insight.
  4. Curve Ball, by Jim Albert and Jay Bennett. I have a rare relationship with this book. I read it before I ever read anything by Bill James. It steered me from being a pure mathematics major in college to a statistics major, which is one of the 5 best decisions I have made in my life. So yeah, I hold this title in high esteem for many personal reasons. That being said, it might be the best book for helping aspiring saberists to start understanding mathematical statistics, which is essential to advancing the field.
  5. The Book, by Tango, Lichtman, and Dolphin. For many saberists, this is the modern treatise on the subject. Grounded in an understanding of Palmer’s Linear Weights system, they introduce wOBA and use it to explore every facet of the game.
  6. For online reference guides, the FanGraphs Sabermetric Library is my preferred site, as I consider to be the most complete. Neil Weinberg is also authoring weekly posts to explain the ins and outs of various metrics, helping keep the reference guide current with new research.
  7. The Best of Baseball Prospectus: 1996-2011 is a 2 volume set that is a compilation of the most important articles from the first 15 years of that sites’ history. This is essentially my proxy for the excellent writing on that website, including Voros McCracken’s article on DIPS Theory and Keith Woolner’s “Baseball’s Hilbert Problems“.
  8. Baseball Hacks, by Joseph Adler. The ability to analyze data is great, but it is useless if you can’t get data to analyze. While the book is somewhat dated, it’s a great introduction to many of the coding skills required to do sabermetrics efficiently in the computing era, and one I still find worthwhile to have on my shelf.
  9. SABR101x, the massively open online course at edX administered by Boston University and designed by Andy Andres et al. If you prefer a class-based method for learning sabermetrics, this is as good as you’ll find. There are tracks on the history of sabermetrics, statistics, SQL/R skills needed, and a build up to understanding some key metrics used by saberists.

One thing I want to keep separate from this list is SABR’s own Guide to Sabermetric Research, which was put together by the aforementioned Phil Birnbaum. His involvement spearheading this SABR 101 project is why I leave it out for now. I have a sense that it will be that guide that is updated as a result of this group work.

SABR 45: A Partial Review for a Partial Experience

Almost 2 years ago, I was sitting on my computer scrolling through Twitter when this appeared:

Yeah, I got a little bit excited when I saw that.

It was never a question of whether or not I would be attending the SABR Convention this year. Having a convention in your backyard has some benefits, the biggest of which is cost. Aside from the convention registration, I had my choice as far as how to get to and from the Palmer House and whether I wanted to sleep in a hotel bed or my own. With an infant crawling around my house, I chose to not book a hotel room (at approximately $200/night) and took commuter rail in and out of Chicago each day.

The downsides of my lodging and travel decision were twofold: 1) I didn’t partake in nearly as many hallway and bar conversations as I did last year, depriving me of what many consider the most fun part of the convention experience; and 2) it made it easier for other things to pull me away from the convention activities. Having to catch a train at 7 am to just make it to a day of events running from 8 am to 10 pm meant having to reconcile sleep with the train schedule. Then, family events cropped up on the weekend, making it unfeasible for me to go downtown Saturday or Sunday. While missing Sunday only cost me the Historic Ballpark Site tour, not being able to attend Saturday cost me half of the presentations and panels and most of the committee meetings I was interested in.

However, what I did attend and help with as a volunteer and member of the host chapter was quite fantastic. Wednesday is typically a travel and get acquainted with the city day. With minimal travel, I helped as a volunteer with registration and Cubs ticket distribution. As with past conventions, there as a tour of the host city. I skipped this year’s walking tour due to the aforementioned volunteer work, but Jacob Pomrenke put together a fantastic document highlighting the sites with baseball history attached to them as the tour traversed downtown Chicago. (If a KML file gets created for it, I’ll link to it here). After registration closed down for the night, I sacrificed the welcome reception in order to catch the train and be home.

Thursday was what I presume is a rare day in recent SABR Convention history. At no time did any attendees have to pick between different meetings or presentations, as it was a single program of events for the day. Cubs broadcasters Len Kasper, Jim Deshaies, and Ron Coomer graced the broadcasters panel in the morning, chiming in when moderator Curt Smith would let them do so. Many of Smith’s questions centered on the Cubs, and all three provided the level of insight that I’ve become accustomed to when I do tune in for Cubs broadcasts. This was followed by the annual business meeting, which showed the continued positive growth of the society but, unlike last year, revealed no final verdict on next year’s convention. It seems the society learned its lessons the hard way: Houston had a hotel location near a mall instead of the ballpark due to the latter option’s lack of availability after the 2014 MLB schedule was released; Chicago corrected for that by getting the ideal hotel location early, but ending up victimized by selecting the one weekend BOTH Chicago clubs were on the road. There is a tentative plan for SABR 46’s host next year, but it would be unwise to get excited for seeing a bobble head museum and the most colorful home run sculpture in MLB quite yet (never mind my own personal ability to attend next year). Thankfully, despite the lack of weekend games, the Cubs were finishing up a series with the Dodgers, so Thursday afternoon’s getaway day contest ended up being the convention game. It was entertaining simply because of Joe Maddon’s tinkering with the line-up every 2 innings or there about. Thursday night ended up being what I think was the biggest highlight of the convention (and perhaps a way of the national office apologizing for the schedule debacle): a concert in the Palmer House’s Grand Ballroom with the Baseball Project.The Baseball Project Rocks SABR45From left to right, Scott McCaughey, Linda Pitmon, Mike Mills, and Steve Wynn rocked the house with their songs about Harvey Haddix, Ted Williams,  Larry Yount, Big Ed Delahanty, and many others. Wisely, they opened with “Box Scores” of their album 3rd, which to me is the quintessential SABR song. It was pretty awesome. If you like baseball and rock music (especially R.E.M.), you’ll love this band.

After forgetting my phone at home Friday morning, I made it in time for the second group of presentations Friday morning, dropping in on Tara Kreiger’s presentation about Andy Coakley’s labor struggles with organized baseball. It was a fascinating story that I was unfamiliar with, but it exemplified the blackballing many early players went through when they complained about their contract. This was followed by 2 panels: one title Pitching Prodigies that featured Steve Trout and Joe Berton a.k.a. “Sidd Finch”, and an presentation by the 4 Letters on an upcoming project. The former was my favorite panel I attended, as Berton told the story of how he got involved in the Sidd Finch hoax perpetrated by George Plimpton and Sports Illustrated. Trout seem more subdued about his experiences, which I guess is to be expected from an 8th overall pick who did not have the career he expected to have. The latter was a “stealth announcement” about a project entitled “1927: The Diary of Myles Thomas”, which looks to chronicle the 1927 Yankees via “real-time historical fiction” storytelling. I kind of like the concept, but will probably wait and see what ends up being produced by Steve Wulf and Douglas Alden Warshaw. The presentation I saw after the panels was entitled “Aging Fan Base: Using Twitter to Develop a New Geneartion of Baseball Fans” and given by Allison Levin. Unfortunately, she didn’t get to many suggestions in her slides, as most of the time was spent looking at Twitter usage during the 2014 World Series. But she has a few avenues for further exploration that will hopefully yield some results, thought I have a sense that MLB might be ahead of her on doing this.

The morning block was followed by a tribute-filled awards luncheon. I skipped this last year, since my meal times were spent with my wife who graciously traveled to Houston with me. I’m glad I went this year, because I got a better sense of what this organization means to so many people. Tom Hufford couldn’t avoid breaking down as he eulogized two of his fellow Cooperstown 16 that founded SABR, Ray Nemec and Joe Semenick. Phil Rogers had it a bit easier in terms of emotions, but still had to encapsulate what Ernie Banks and Minnie Minoso meant to their adopted hometown. He did so, and did it well. After the banquet I took time to peruse the vendor room, which is a dangerous endeavor given the number of baseball books that are available for sale. My wallet came away only somewhat dented. The only committee meeting I attended was for the Business of Baseball, which gave an update on the Winter Meetings project (all years are being researched by someone!), the Team Ownership bios (4 of 30 done or in progress), and a reminder from chair Michael Haupert about the importance of examining the source of data in research, using examples from the pre-1983 salary database to show how what’s printed isn’t always accurate.

I then attended 5 more presentations between the committee meeting and heading home. In order:

  • David Kaiser questioned “What Makes a Dynasty?” He counted at teams who played postseason baseball in 3 of 6 seasons as a dynasty, splitting the analysis into 3 eras based on the postseason structure in place. He noted which ones were dominated by pitching and which ones weren’t. Most of the expected teams showed up where you would expect. The only bone I pick is that, based on the average winning percentage by era for the dynastic teams in the study, he said mediocrity was more prevalent today then it used to be. I think that’s just a function of his definition of dynasty.
  • David W. Smith, the Retrosheet president, updated his look at run scoring in the 1st inning, asserting that travel doesn’t seem to have an effect but that the number of runs the visiting team scores in the top of the 1st is highly correlated with the number of runs they allow in the bottom of the 1st. You can find his paper on Retrosheet’s site.
  • Zach Moser gave an oral presentation on how Cap Anson’s views on colored players in professional baseball were portrayed over time. While revered in his time, Anson’s racism became a hot topic while he was among the early players considered for induction into Cooperstown’s most noted museum. Anson’s racism was revisited as many of his team records for the Cubs were eclipsed by the aforementioned Ernie Banks, and Moser suggests that most modern apologists for Anson are deficient in their criticism.
  • John Burbridge examined “The Increasing Importance of Quality Starts” by mostly just doing an x-ray on the definition of a quality start. He ultimately came to the conclusion that 6 IP with 3 or fewer runs allowed is reasonable, and claims that is it increasingly relevant as bullpens are utilized more and more.
  • Finally, Bruce Allardice talked about how pro baseball became a big part of Chicago in the mid 1800s. Baseball grew in popularity in Chicago, paralleling the game’s growth in popularity nationwide. By 1870, the city’s elite coveted the status of being the nation’s pork capital, vying against a river town called Cincinnati. Because of this rivalry with the 2015 All Star Game host city, Chicago’s wealthy pooled funds to found the first professional club in the City. The White Stockings did manage to beat Cincinnati twice late in that season, and would go to claim the championship based on a disputed victory over the New York Mutuals, who also claimed the title. Unfortunately, baseball took a 2 year hiatus after a cow tipped a lantern and ignited a magnificent blaze that required years of rebuilding.

I’d love to say more about SABR 45, but (1) I’m already at 1,750 words if you’ve read to this point and (2) the downside of a local convention is that you can be pulled to do other things since you aren’t travelling. That’s what happened to me on the weekend, as family event popped up and hindered by ability to get in and out of the city. I don’t know if I’ll get to go to another convention for a while at this point, and next year looks doubtful regardless of location. When I do go again, I’m going to make sure of 2 things: I’m staying at the hotel so I can go hang at the bars and talk baseball over beers. That’s the convention experience that I missed, and why those who go to one convention try to make it an annual trip.

Statcasting Expectations

The next level of public baseball data has arrived. MLB Advanced Media’s Statcast made a hyped television debut, although it had made cameos in online replay videos last year. With the system installed in all 30 ballparks to track all movement on the field, hopes are high for discovering many things about the game via data that previously could only be imprecisely discerned by watching a lot of baseball.

However, while MLBAM have stated that Statcast data will be made public, it is still unclear what types of data and how much of it will be available for public use. Bits and pieces of the data have slowly appeared as the 2015 season started. Among the first pieces have been the velocity and angle of the ball off the bat, which the savvy scrapers, such as Daren Wilman of Baseball Savant fame, of the Gameday files have captured and published. But whether the public will have access to the raw data remains to be seen.

It seems unlikely to me that there will be public access to the raw Statcast data anytime soon. The first challenge is the sheer size of the data set, which is already measured in petabytes. This is unlike the pitchF/X data, which can be scraped and saved on a home PC. Raw Statcast data is best stored on a cloud server. While MLBAM is certainly using “the cloud” as the method for allowing the 30 teams to access the data, it would be a massive security risk to open that server up to the public domain. Setting up a public server would be an additional cost, and it’s hard to argue that there would be any significant return on that investment for MLBAM. However, Statcast is already sponsored by Amazon Web Services, so the possibility is there for the raw data to be made public via the AWS platform. That possibility seems very remote at this time.

A more likely scenario (at least in my mind) for the release of Statcast data is something like what the NBA did with its SportVU data. SportVU, the player tracking system developed by a subsidiary company of STATS, Inc., is akin to Statcast in that it tracks player and ball movement. The Stats section of (linked above) shows various measures and animations gleaned from the SportVU data, but does not provide fans access to the raw data. This is the path I expect MLBAM to take. The batted ball data that has already shown up in Gameday is like this, and many of the other metrics that have been teased via broadcast, such as route efficiency and perceived velocity, could also be distributed in this manner.

Releasing the data in a summarized or snapshot form isn’t as risky to the teams, who were not all that happy when pitchF/X data made its way into the open world. Allowing public researchers to make insights based on that available to all teams took away an opportunity to gain a competitive advantage. This is why the other Sportvision products, like hitF/X that also provided batted ball information and commandF/X that tracked where the catcher’s glove was position, have been available to teams but not the public.

Regardless of what form the data takes when it is released, Statcast data should enable saberists to use more granular data to show what it takes to succeed in the game of baseball. Some of these data-driven discoveries may merely affirm what scouts and those in the game have been taught and believed for years and decades, but I’m sure some will not. Like many others, I can’t wait to get my hands on it.

Sabermetrically Gaming: Strat-O-Matic Baseball

I happen to be a man of many interests. Besides baseball, one of my other primary interests is gaming, especially tabletop gaming. My interest in games is rooted in my love of competition and my explorations of the world via mathematics and statistics. It’s no coincidence those are traits inherent to following baseball as well.

Today, I’m starting a series exploring various games that attempt to simulate baseball. My focus will be more of the math that underlies each game and how closely it helps replicate the on-field experience, though I’m sure some game play commentary will filter in. Leading off is perhaps the most well-known of the baseball table top simulations, Strat-O-Matic Baseball.

In the book Curve Ball, Jim Albert and Jay Bennett open the book with a dissection of how various baseball tabletop games model the actual action of a baseball game. Naturally, Strat-O-Matic Baseball was covered, in which they explain some of the math behind the model and how it assigned credit to the batter, pitcher, and defense. I want to focus more on the game design and the probabilities involved.

The basic mechanics of the game are relatively simple, though there are optional levels of complexity that can be added to the game now that were not a part of the original edition. There are batter cards and pitcher cards, and each card contains a table of possible results that are determined by the roll of 3 six-sided dice. One die, typically white, determines which card and which column the result comes from, with the result corresponding to the the sum of the 2 other dice, typically red, in the designated column. In many instances, a result then requires the roll of an additional 20-sided die. This provides 4,320 different possible outcomes.

SOMpitcher SOMBatter

Unfortunately, there is no master database of SOM player cards that is available to fully analyze this model. However, a massive Strat-O-Matic Baseball fan by the name of Bruce Bundy put together a bunch of formulas to forecast how a player’s card would be created. My impression is that he created these formulas by looking at a bunch of player card sheets. I’ll use it here because it’s the best publicly available information about the game model that I can find.

Looking at the formulas provides insights into a number of assumptions made about baseball by Strat-O-Matic. Player cards are customized based on their statistics, but this customization is achieved using some assumptions about the probabilities of certain events occurring that are built into the game model.

Consider the old fashioned base-on-balls, the least sexy of the Three True Outcomes. The Walk formulas for both Batters and Pitchers both are adjusted by a constant of 9. In terms of SOM, this means that the batter and pitcher cards are designed with the assumption that 9 out of the 108 results from the other card will result in a walk. Thus, the game implies an unintentional walk occurs about 8.3% of the time in baseball, with the credit being split between the pitcher and the batter. While the latter claim is not possible to investigate prior to pitch-by-pitch data being available, the former is. Here’s the overall major league non-Intentional walk rate year-by-year since 1952, using the event logs courtesy of Retrosheet

NonIBBWalkRateYou see that for most seasons here, the actual MLB non-intentional walk rate (in red) is slightly less than the the estimated rate modeled by SOM (in blue). The average across these seasons is that non-IBB walks occur in 7.81% of the plate appearances, which is about 17/216. Since 17 is an odd number, it can’t be divided equally between the batter and pitcher, a key component of the Strat-O-Matic model. Thus, it seems that the walk rate implied by Bundy’s formulas is reasonable, though a bit high.

Here are the implied rates from Bundy’s formulas and their actual instance rates from the same Retrosheet data for a few other events:

  • Doubles – SOM rate of 180/4320 = 4.2%, Actual = 4.1%
  • Triples – SOM rate of 30/4320 = 0.7%, Actual = 0.6%
  • HRs – SOM rate of 100/4320 = 2.3%, Actual = 2.3%

This replication of a generic baseball reality is why Strat-O-Matic has been so beloved for over 50 years. Hal Richman, the game’s inventor and mastermind, has created a game model that is flexible enough to work across eras. This enables SOM to sell new sets based on every season and specially designed sets, all of which can be mixed and matched as the gamer sees fit. If you’ve never played the game, find a way to do so at least once.

2015 SABR Analytics Conference Research Awards

Voting closed on President’s Day for this year’s SABR Analytics Conference Research Awards, and like last year, I have taken a great interest in seeing which articles were nominated. Although the voting is closed, I once again am sharing which articles I voted for and runners up in each category.

Contemporary Baseball Analysis: Harry Pavlidis and Dan Brooks, “Framing and Blocking Pitches: A Regressed, Probabilistic Model,” Baseball Prospectus, March 3, 2014.
This category was stacked. I could have reasonably voted for 4 of the 5 articles. But Pavlidis and Brooks managed to stand out above the rest by a hair. Like Max Marchi’s winning article from last year, this is another landmark addition to our statistical understanding of catcher framing, possibly the hottest topic in sabermetric research until the StatCast data sees the public light of day. While Jonathan Judge and this duo have already updated and improved on their work, its import to quantifying catcher framing was without equal in 2014.
Runner up: Jon Roegele, “The Effects of Pitch Sequencing,” The Hardball Times, November 24, 2014.
Pitch sequencing is my current favorite topic in sabermetric research. It’s not quite as popular as catcher framing because sequencing is largely dependent on the pitcher’s arsenal and the techniques needed to study sequencing tend go beyond basic data mining. Jon’s work is the best on the topic that doesn’t require an understanding of Markov chains and/or the mathematical mechanics of game theory.
The other 2 articles I almost voted for were:

  • Russell Carleton, “N=1,” Baseball Prospectus 2014: The Essential Guide to the 2014 Season, January 2014. Pizza asks what we really know about an individual player, and explores swing rates for individual players using regression. (Yes, I’m one of those who instantly started mouthing GLM, HLM, and MLM at the words “gory math” and “regression” in the article.)
  • Jeff Sullivan, “Alex Gordon Barely Had a Chance,” FanGraphs, October 30, 2014. The best breakdown of the most scrutinized play of this year’s World Series.

Historical Analysis/Commentary: Steve Treder, “The Strikeout Ascendant (and What Should Be Done About It),” The Hardball Times Baseball Annual 2014.
A tough category to pick, but Steve’s breakdown of strikeout eras in baseball history was an exploration reminiscent of a Bill James essay do in his 1980s Abstracts. He explores strikeout rates rates through history, citing that the increase is part of a natural rise of the power game in baseball, both at the plate and on the mound. Nothing, not even a proposal to lop off the bottom three inches of the strike zone, will change the minds of batters sacrificing discipline for power or pitchers trying to keep that power in check by throwing hard at the expense of in-game longevity.
Runner Up: Bryan Soderholm-Difatte, “The 1914 Stallings Platoon: Assessing Execution, Impact, and Strategic Philosophy,” SABR Baseball Research Journal, Fall 2014.
While platoons aren’t anything new, I always find it interesting when someone looks at a season in the distant past using modern tools. Bryan’s analysis of the 1914 Stallings was well thought out and about as comprehensive as such an analysis is capable of being.

Contemporary Baseball Commentary: Lewie Pollis, “If You Build It: Rethinking the Market for Major League Baseball Front Office Personnel,” Brown University, senior honors thesis, Spring 2014.
Most senior theses don’t make it beyond the adviser’s desk. If you happen to read one, it’s probably because you know the person who wrote it or you were in the person’s grauduating class and major while they wrote it. Lewie’s thesis is clearly more pubic than that. It’s also an extremely articulate breakdown as to why wages for lower-level front office personnel should be higher. It won my vote in a rout.
Runner Up: Eno Sarris, “Learning the Language of the Clubhouse,” The Hardball Times, March 13, 2014.
Eno’s article was full of wonderful anecdotes and personal reflections on speaking the ballplayer’s language. It’s the runner up almost by default, as the other 3 articles rehashed (or completely missed) ideas I have previously seen explored.