Category: Uncategorized

Updating Baseball Hacks, Chapter I

In 2006, O’Reilly released a book that became one of the best guides for those new to sabermetrics: Baseball Hacks, by Joseph Adler. The book lays out 75 different tasks that can take someone from baseball novice to competent analyst of baseball data. It’s still a good book, but it definitely shows its age. Sadly, the author has stated there are no plans to update the book. Thankfully, the book is still easy to find through booksellers of choice, and it’s also available on Safari, O’Reilly’s online service.

Given how many things have changed in the last 12 years in sabermetrics in terms of data, technologies, and metrics, it’s time to look at updating this book. That’s what I’ll be doing in this series of blog posts. The plan is to go through each of Adler’s hacks, updating those that should be updated, and then adding some new hacks that aren’t covered in the book but will touch on key components of sabermetrics today.

Baseball Hacks organizes the hacks into chapters. For this post, let’s review Chapter 1 of the book, which includes Hacks #1-#7.

Hack #1: Score a Baseball Game
The only update to what Adler wrote for this hack is to highlight the presence of mobile apps for scoring baseball games, because the smartphone as a consumer product didn’t really take off until the launch of the iPhone a year after this book was published. My personal favorite app for scorekeeping is iScore

Hack #2: Make a Box Score from a Scoresheet
No updates here. Apps will do this for you if you score electronically.

Hack #3: Keep Score, Project Scoresheet-Style
Almost 35 years after Project Scoresheet became a reality, this system is still in use by Retrosheet. They’ve made a few minor modifications to the system such as accounting for replay reviews, but it’s by and large the same as when Adler published. The current state of the system can be found here.

Hack #4: Follow Pitches During a Game
Confession time: I’m not very good at identifying pitches. I have worn glasses for a long time and have never really been good at picking up spin. Thankfully, the principles Adler lists for identifying pitches are still good while you’re watching a live game. Watching at home gives the added benefit of on-screen strike zones and sites you can bring up on a second screen. Speaking of which…

Hack #5: Follow the Game Online
This is the first hack where the book shows its age. Adler lists 11 sites, 4 for player statistics and 7 for commentary. It was a good list for when it was published, but it omits a few key sites in the current landscape that didn’t exist at the time and it also includes some now-defunct commentary blogs. The original list and and my changes are combined in the table below, with the new sites in italics and the removed sites crossed out:

Statistics Commentary Baseball Prospectus
ESPN Baseball Graphs
Baseball Reference Baseball Musings
Retrosheet Thorn Pricks
Brooks Baseball Most Valuable Network
Baseball Savant Baseball Think Factory
The Baseball Gauge Tango on Baseball

The changes, explained:

  • Baseball Graphs has not been updated since 2004. Dave Studemund, its proprietor, moved his work over to The Hardball Times (now a part of Fangraphs).
  • Thorn Pricks stopped being updated when John Thorn became the Official Historian of Major League Baseball. His writing moved to his blog for that role, Our Game. It’s a great blog for the history of the game, but not really a site for statistically-based baseball commentary
  • Most Valuable Network went defunct in 2010. It had one blog with statistically-driven baseball commentary that was worth reading, Stat Speak. My favorite of their authors, Russell Carleton a.k.a. Pizza Cutter, is now writing for Baseball Prospectus. The StatSpeak archive can be found here.
  • Brooks Baseball became the go-to site for pitchF/X data when that became available in 2006. It’s still a valuable resource, thanks to the manual pitch classification done by Harry Pavlidis.
  • Baseball Savant is a part of the family after hiring Daren Willman to help find ways to present Statcast data. It’s the prime source for the new metrics being created from Statcast, and thus merits its own mention separate from’s own stats page.
  • The Baseball Gauge, created by Dan Hirsch, is similar to Baseball-Reference in how it presents stats. It’s affiliated with and powers the Negro League Database and the Ballparks Database.
  • Fangraphs launched in 2005, and gained my notice in 2006 with the addition of live Win Probability charts during games. It has expanded steadily since then to include some of the best commentary on baseball through its own daily writers and the acquisition of The Hardball Times in 2012. It also features the most accessible presentation of traditional and advanced metrics outside of Baseball-Reference.

Hack #6: Add Baseball Searches to Firefox Follow the game on your phone
Technically, it’s still possible to do this. A couple things have changed in the world that make the original hack obsolete. First, Chrome became available and currently holds 60% market share for internet browsers. Second, the smartphone became a consumer device. So, instead of adding search tools to a browser you don’t use, here are some my favorite baseball apps for your phone. All are available for the 2 major phone operating systems:

  • At Bat is the must have app from the league. The easiest way to access information about any team and the only way to watch, their excellent streaming service for out-of-market games
  • MiLB First Pitch is Minor League Baseball’s equivalent of At Bat. With increased attention on teams farm systems, this is the ideal way to track prospects for your favorite MLB team.
  • A app to buy tickets on the secondary market. StubHub and SeatGeek are my go-to apps when I want to do this.
  • I mentioned iScore earlier when talking about scorekeeping in Hack #1, and will repeat that recommendation here. It’s free to download and use. If you want to spend money, you can buy a file with rosters of every major league team for the season that will update throughout the season for $20, which is handy if you use it on a daily basis.

Hack #7: Find Images of Stadiums
This hack has held up surprisingly well. Sure, corporate mergers have turned Pac Bell Park in AT&T Park, but such is life. The Pro version of Google Earth is now free, something that wasn’t true when Baseball Hacks was published.

Next time, we’ll start digging into Chapter 2’s hacks.


A SABR Writer’s Day

3 members of the baseball media took a couple hours out of their Labor Day weekend to talk baseball and media with 15 members of the Emil Rothe chapter.

UPDATED: Now with videos!

Batting lead-off was Sun-Times beat writer Gordon Wittenmeyer. I can’t comment on any opening remarks he made, because I was a little late for an unusually prompt meeting start time. He covers the Cubs, so much of the Q+A I did hear centered on the struggles of the club this year in the wake of last year’s championship. A few highlights:

  • Having covered 4 different clubs (Seattle, LAnaheim, and Minnesota previously) a big issue in baseball that should have been solved 20 years ago is the language barrier with Spanish-speakers from Latin America. Too many organizations, even a few years ago, had the mindset that “baseball is the only language that matters”. Yet the experiences of Dennis, Ramon, and Pedro Martinez are a clear example that it isn’t. Clubs are coming around to realize this, but still make mistakes. The Cubs exemplified this with the fiasco of a press conference that occurred after they acquired Aroldis Chapman.
  • The closest the Cubs came to falling out of contention seemed to be the last week of June. In case you’re not inundated with Cubs talk as often as I am, that was the week the team took a 2nd visit to the White House and Montero was made an example of and released for speaking his mind. Good thing the NL Central never ran away from their talent level.
  • David Ross is missed somewhat in the clubhouse, but his absence is minor compared to the absence of the massively unifying goal that breaking the drought was
  • He lauded the organization on what it has done with the ballpark improvements, but was less keen on how the team has driven down property values to take over the rooftops and neighborhood around the ballpark.

Batting second, and for me the highlight of the afternoon, was Peabody award winner Julie DiCaro. I’m fairly certain that if you know her for one thing, it’s this video that won her said award. A former lawyer, she meandered her way into sports media through the explosion of the blogosphere. The now-radio host ignited a lively discussion on the usefulness of stats, discussing both her use as a member of the media and how the public consumes the information explosion that sabermetrics and now Statcast are producing. She also talked about the efforts she’s made to help women in or interested in sports media to network with each other, and opined on the possibility there will be a female GM in the next 15 years.

Wait, that’s it? My notes are way shorter for Julie than they were for Gordon, yet I said she was the highlight of my afternoon. Why? She talked less. Why did she talk less? With Julie at the podium, that lively discussion was very much an open discussion, with multiple people (myself included at a couple points) chiming in on particular stats and their usefulness. With Gordon, most everyone held to the Q + A protocol: someone asks, the speaker answers. Thinking back, I ask myself the question of what she thought of that difference in the dynamic and whether it was as she intended. Julie left before the meeting ended and I could ask her. You can watch the videos once their posted and opine on your own.

Anyway, back to the meeting, where one more speaker took the audience on a trip through the minor leagues. Emily Waldon writes about Tigers prospects for 20/80 Baseball and The Athletic Detroit. Her interest in baseball started with having 4 brothers and took off when the West Michigan Whitecaps moved to town in 1994. She started covering the Tigers minor league affiliates as part of the Bless You Boys blog, moving on to her current posts subsequently. Her visit became much more timely with the waiver deadline deal of Justin Verlander to Houston, allowing her to talk about the new acquisitions, the other prospects in the system, and Avila’s philosophy for the rebuild and player development. She also noted that the parents of minor leaguers greatly appreciate her coverage, whether they’re local, across the country, or from one of the Latin American hotbeds of baseball.

Things I learned at JSM 2016

Ed. note – I’m back, and hopefully for a good long while.

This year I found myself with 5 major conventions I was interested in going on within a 2 week span, covering my varied interests in baseball, stats, gaming, and religion. Alas, I could only afford to go to one, so I took advantage of the fact that the Joint Statistical Meetings were being held in Chicago, thus eliminating my need to pay for airfare.

JSM is the largest single gathering of statisticians in the world. The American Statistical Association, the prime organizer among the half dozen statistical societies that co-sponsor the event, always books the host city’s signature convention center in order to hold 6,000 attendees over the course of 6 days. This was only my second time attend, having previously attended the 2008 conference in Denver.

The conference has many different subjects being analyzed, from finance and risk to data modeling and visualization. But, with this being a blog that focuses on baseball, I’ll share 4 things I learned (and 1 thing I already knew) from the sports sessions.

1) The gap between academics and practitioners in statistics is narrowing
The ASA has a well-deserved reputation as an organization that favors academic pursuits in statistics. Its membership is predominantly employed in academia, and membership growth has not kept pace with the growth of the profession. Yet, it seems the explosion of data has led to more cross-over. One paper that was presented listed a FiveThirtyEight writer among its authors.

2) There are more ways to get into baseball than by studying it directly.
One of the presenters I was able to interact with has spent a lot of time looking at SportVu, the NBA’s player tracking data. This person is now slated to start working for a baseball team in the fall because of that work. You can probably guess that the position will focus on understanding Statcast data.

3) Sports teams aren’t looking for subject matter experts with stats knowledge; they want stats experts with subject matter knowledge.
At a panel discussion about stats in sports, both panelists that are currently employed by major sports teams noted that front offices are loaded with SMEs. They want people with stats backgrounds who can analyze the data well. The odds of anyone getting into a front office today in the same manner as Bill James is highly unlikely.

4) It pays to think about analogs from other fields.
Dan Cervone presented a model trying to value court space in the NBA. He built his model analogous to how real estate valuations are made. It’s the type of thinking that leads me to pay attention to JSM, and it also is, in my opinion, a requirement to getting the most out of the conference.

5) Ideas at JSM are starting points, not end points.
The fact that JSM does tend to emphasis statistical methods over results means many papers aren’t necessarily providing new insights into well studied issues, but new ways to analyzing the questions at the heart of the issue. Take this presentation on predicting outcomes of plate appearances. It uses a type of regression modeling designed to handle structured outcomes like in baseball. It may not provide any new insights into how baseball is played, but an idea like this could end up in the next great forecasting system.

The SABR 101 Project

One of the things I missed when I had to skip out of SABR 45 Saturday was the committee meeting for SABR’s largest research committee, Statistical Analysis. Unlike many of the other committees, the Stat Analysis committee didn’t have a group project to work on, in part due to the individual nature of most members’ research. A couple ideas were bandied about the meeting during SABR 44, but it took until SABR 45 to get one of those ideas off the ground.

A few weeks back, Phil Birnbaum, the chair of the committee and editor of the By the Numbers newsletter, announced that group project. The idea is to create a crowd-sourced list of key resources for helping newcomers to sabermetrics learn what has been done and provide to him or her the foundation for additional contributions.

There are plenty of books and articles which I could cite, so I’m going to start with the broad resources that cover multiple topics. That means it does skew towards books. They are listed in the order they came out of my head.

Before I get into my long list, I want to invite you, dear reader, to contribute your recommendations to this project. If you do so in the comments, I’ll be sure to pass them on.

  1. The Numbers Game, by Alan Schwarz. This book came up recently when Graham Womack of Baseball Past & Present and I discussed the importance of this book and a few other titles that will make there way onto this list as for which one we’d recommend first. We both agreed that this title is where we’d tell others to start. A fantastic history of baseball’s numbers, and the understanding of how a particular stat like batting average or OBP came to be is key to understanding any analysis with those measures.
  2. The Hidden Game of Baseball, by John Thorn and Pete Palmer. It’s over 30 years old, and it might be the most important book in sabermetric history. There’s a reason I started my sabermetric research database project with this book: it was The Numbers Game before Schwarz wrote his book with its concise history of baseball statistics AND it introduced the linear weights model to the world, which is much more of the mathematical foundation of modern sabermetrics than anything put out by the most famous name in the field.
  3. The Bill James Abstracts, both the annuals printed from 1977-1988 and the Historical Abstract (first published in 1986, revised and updated in 2001). For the many who grew up before Al Gore’s invention came to the masses, these books were how they were introduced to sabermetrics. Bill isn’t a statistician in the academic sense, but his understanding of baseball endows his analyses with tremendous insight.
  4. Curve Ball, by Jim Albert and Jay Bennett. I have a rare relationship with this book. I read it before I ever read anything by Bill James. It steered me from being a pure mathematics major in college to a statistics major, which is one of the 5 best decisions I have made in my life. So yeah, I hold this title in high esteem for many personal reasons. That being said, it might be the best book for helping aspiring saberists to start understanding mathematical statistics, which is essential to advancing the field.
  5. The Book, by Tango, Lichtman, and Dolphin. For many saberists, this is the modern treatise on the subject. Grounded in an understanding of Palmer’s Linear Weights system, they introduce wOBA and use it to explore every facet of the game.
  6. For online reference guides, the FanGraphs Sabermetric Library is my preferred site, as I consider to be the most complete. Neil Weinberg is also authoring weekly posts to explain the ins and outs of various metrics, helping keep the reference guide current with new research.
  7. The Best of Baseball Prospectus: 1996-2011 is a 2 volume set that is a compilation of the most important articles from the first 15 years of that sites’ history. This is essentially my proxy for the excellent writing on that website, including Voros McCracken’s article on DIPS Theory and Keith Woolner’s “Baseball’s Hilbert Problems“.
  8. Baseball Hacks, by Joseph Adler. The ability to analyze data is great, but it is useless if you can’t get data to analyze. While the book is somewhat dated, it’s a great introduction to many of the coding skills required to do sabermetrics efficiently in the computing era, and one I still find worthwhile to have on my shelf.
  9. SABR101x, the massively open online course at edX administered by Boston University and designed by Andy Andres et al. If you prefer a class-based method for learning sabermetrics, this is as good as you’ll find. There are tracks on the history of sabermetrics, statistics, SQL/R skills needed, and a build up to understanding some key metrics used by saberists.

One thing I want to keep separate from this list is SABR’s own Guide to Sabermetric Research, which was put together by the aforementioned Phil Birnbaum. His involvement spearheading this SABR 101 project is why I leave it out for now. I have a sense that it will be that guide that is updated as a result of this group work.

SABR 45: A Partial Review for a Partial Experience

Almost 2 years ago, I was sitting on my computer scrolling through Twitter when this appeared:

Yeah, I got a little bit excited when I saw that.

It was never a question of whether or not I would be attending the SABR Convention this year. Having a convention in your backyard has some benefits, the biggest of which is cost. Aside from the convention registration, I had my choice as far as how to get to and from the Palmer House and whether I wanted to sleep in a hotel bed or my own. With an infant crawling around my house, I chose to not book a hotel room (at approximately $200/night) and took commuter rail in and out of Chicago each day.

The downsides of my lodging and travel decision were twofold: 1) I didn’t partake in nearly as many hallway and bar conversations as I did last year, depriving me of what many consider the most fun part of the convention experience; and 2) it made it easier for other things to pull me away from the convention activities. Having to catch a train at 7 am to just make it to a day of events running from 8 am to 10 pm meant having to reconcile sleep with the train schedule. Then, family events cropped up on the weekend, making it unfeasible for me to go downtown Saturday or Sunday. While missing Sunday only cost me the Historic Ballpark Site tour, not being able to attend Saturday cost me half of the presentations and panels and most of the committee meetings I was interested in.

However, what I did attend and help with as a volunteer and member of the host chapter was quite fantastic. Wednesday is typically a travel and get acquainted with the city day. With minimal travel, I helped as a volunteer with registration and Cubs ticket distribution. As with past conventions, there as a tour of the host city. I skipped this year’s walking tour due to the aforementioned volunteer work, but Jacob Pomrenke put together a fantastic document highlighting the sites with baseball history attached to them as the tour traversed downtown Chicago. (If a KML file gets created for it, I’ll link to it here). After registration closed down for the night, I sacrificed the welcome reception in order to catch the train and be home.

Thursday was what I presume is a rare day in recent SABR Convention history. At no time did any attendees have to pick between different meetings or presentations, as it was a single program of events for the day. Cubs broadcasters Len Kasper, Jim Deshaies, and Ron Coomer graced the broadcasters panel in the morning, chiming in when moderator Curt Smith would let them do so. Many of Smith’s questions centered on the Cubs, and all three provided the level of insight that I’ve become accustomed to when I do tune in for Cubs broadcasts. This was followed by the annual business meeting, which showed the continued positive growth of the society but, unlike last year, revealed no final verdict on next year’s convention. It seems the society learned its lessons the hard way: Houston had a hotel location near a mall instead of the ballpark due to the latter option’s lack of availability after the 2014 MLB schedule was released; Chicago corrected for that by getting the ideal hotel location early, but ending up victimized by selecting the one weekend BOTH Chicago clubs were on the road. There is a tentative plan for SABR 46’s host next year, but it would be unwise to get excited for seeing a bobble head museum and the most colorful home run sculpture in MLB quite yet (never mind my own personal ability to attend next year). Thankfully, despite the lack of weekend games, the Cubs were finishing up a series with the Dodgers, so Thursday afternoon’s getaway day contest ended up being the convention game. It was entertaining simply because of Joe Maddon’s tinkering with the line-up every 2 innings or there about. Thursday night ended up being what I think was the biggest highlight of the convention (and perhaps a way of the national office apologizing for the schedule debacle): a concert in the Palmer House’s Grand Ballroom with the Baseball Project.The Baseball Project Rocks SABR45From left to right, Scott McCaughey, Linda Pitmon, Mike Mills, and Steve Wynn rocked the house with their songs about Harvey Haddix, Ted Williams,  Larry Yount, Big Ed Delahanty, and many others. Wisely, they opened with “Box Scores” of their album 3rd, which to me is the quintessential SABR song. It was pretty awesome. If you like baseball and rock music (especially R.E.M.), you’ll love this band.

After forgetting my phone at home Friday morning, I made it in time for the second group of presentations Friday morning, dropping in on Tara Kreiger’s presentation about Andy Coakley’s labor struggles with organized baseball. It was a fascinating story that I was unfamiliar with, but it exemplified the blackballing many early players went through when they complained about their contract. This was followed by 2 panels: one title Pitching Prodigies that featured Steve Trout and Joe Berton a.k.a. “Sidd Finch”, and an presentation by the 4 Letters on an upcoming project. The former was my favorite panel I attended, as Berton told the story of how he got involved in the Sidd Finch hoax perpetrated by George Plimpton and Sports Illustrated. Trout seem more subdued about his experiences, which I guess is to be expected from an 8th overall pick who did not have the career he expected to have. The latter was a “stealth announcement” about a project entitled “1927: The Diary of Myles Thomas”, which looks to chronicle the 1927 Yankees via “real-time historical fiction” storytelling. I kind of like the concept, but will probably wait and see what ends up being produced by Steve Wulf and Douglas Alden Warshaw. The presentation I saw after the panels was entitled “Aging Fan Base: Using Twitter to Develop a New Geneartion of Baseball Fans” and given by Allison Levin. Unfortunately, she didn’t get to many suggestions in her slides, as most of the time was spent looking at Twitter usage during the 2014 World Series. But she has a few avenues for further exploration that will hopefully yield some results, thought I have a sense that MLB might be ahead of her on doing this.

The morning block was followed by a tribute-filled awards luncheon. I skipped this last year, since my meal times were spent with my wife who graciously traveled to Houston with me. I’m glad I went this year, because I got a better sense of what this organization means to so many people. Tom Hufford couldn’t avoid breaking down as he eulogized two of his fellow Cooperstown 16 that founded SABR, Ray Nemec and Joe Semenick. Phil Rogers had it a bit easier in terms of emotions, but still had to encapsulate what Ernie Banks and Minnie Minoso meant to their adopted hometown. He did so, and did it well. After the banquet I took time to peruse the vendor room, which is a dangerous endeavor given the number of baseball books that are available for sale. My wallet came away only somewhat dented. The only committee meeting I attended was for the Business of Baseball, which gave an update on the Winter Meetings project (all years are being researched by someone!), the Team Ownership bios (4 of 30 done or in progress), and a reminder from chair Michael Haupert about the importance of examining the source of data in research, using examples from the pre-1983 salary database to show how what’s printed isn’t always accurate.

I then attended 5 more presentations between the committee meeting and heading home. In order:

  • David Kaiser questioned “What Makes a Dynasty?” He counted at teams who played postseason baseball in 3 of 6 seasons as a dynasty, splitting the analysis into 3 eras based on the postseason structure in place. He noted which ones were dominated by pitching and which ones weren’t. Most of the expected teams showed up where you would expect. The only bone I pick is that, based on the average winning percentage by era for the dynastic teams in the study, he said mediocrity was more prevalent today then it used to be. I think that’s just a function of his definition of dynasty.
  • David W. Smith, the Retrosheet president, updated his look at run scoring in the 1st inning, asserting that travel doesn’t seem to have an effect but that the number of runs the visiting team scores in the top of the 1st is highly correlated with the number of runs they allow in the bottom of the 1st. You can find his paper on Retrosheet’s site.
  • Zach Moser gave an oral presentation on how Cap Anson’s views on colored players in professional baseball were portrayed over time. While revered in his time, Anson’s racism became a hot topic while he was among the early players considered for induction into Cooperstown’s most noted museum. Anson’s racism was revisited as many of his team records for the Cubs were eclipsed by the aforementioned Ernie Banks, and Moser suggests that most modern apologists for Anson are deficient in their criticism.
  • John Burbridge examined “The Increasing Importance of Quality Starts” by mostly just doing an x-ray on the definition of a quality start. He ultimately came to the conclusion that 6 IP with 3 or fewer runs allowed is reasonable, and claims that is it increasingly relevant as bullpens are utilized more and more.
  • Finally, Bruce Allardice talked about how pro baseball became a big part of Chicago in the mid 1800s. Baseball grew in popularity in Chicago, paralleling the game’s growth in popularity nationwide. By 1870, the city’s elite coveted the status of being the nation’s pork capital, vying against a river town called Cincinnati. Because of this rivalry with the 2015 All Star Game host city, Chicago’s wealthy pooled funds to found the first professional club in the City. The White Stockings did manage to beat Cincinnati twice late in that season, and would go to claim the championship based on a disputed victory over the New York Mutuals, who also claimed the title. Unfortunately, baseball took a 2 year hiatus after a cow tipped a lantern and ignited a magnificent blaze that required years of rebuilding.

I’d love to say more about SABR 45, but (1) I’m already at 1,750 words if you’ve read to this point and (2) the downside of a local convention is that you can be pulled to do other things since you aren’t travelling. That’s what happened to me on the weekend, as family event popped up and hindered by ability to get in and out of the city. I don’t know if I’ll get to go to another convention for a while at this point, and next year looks doubtful regardless of location. When I do go again, I’m going to make sure of 2 things: I’m staying at the hotel so I can go hang at the bars and talk baseball over beers. That’s the convention experience that I missed, and why those who go to one convention try to make it an annual trip.

Statcasting Expectations

The next level of public baseball data has arrived. MLB Advanced Media’s Statcast made a hyped television debut, although it had made cameos in online replay videos last year. With the system installed in all 30 ballparks to track all movement on the field, hopes are high for discovering many things about the game via data that previously could only be imprecisely discerned by watching a lot of baseball.

However, while MLBAM have stated that Statcast data will be made public, it is still unclear what types of data and how much of it will be available for public use. Bits and pieces of the data have slowly appeared as the 2015 season started. Among the first pieces have been the velocity and angle of the ball off the bat, which the savvy scrapers, such as Daren Wilman of Baseball Savant fame, of the Gameday files have captured and published. But whether the public will have access to the raw data remains to be seen.

It seems unlikely to me that there will be public access to the raw Statcast data anytime soon. The first challenge is the sheer size of the data set, which is already measured in petabytes. This is unlike the pitchF/X data, which can be scraped and saved on a home PC. Raw Statcast data is best stored on a cloud server. While MLBAM is certainly using “the cloud” as the method for allowing the 30 teams to access the data, it would be a massive security risk to open that server up to the public domain. Setting up a public server would be an additional cost, and it’s hard to argue that there would be any significant return on that investment for MLBAM. However, Statcast is already sponsored by Amazon Web Services, so the possibility is there for the raw data to be made public via the AWS platform. That possibility seems very remote at this time.

A more likely scenario (at least in my mind) for the release of Statcast data is something like what the NBA did with its SportVU data. SportVU, the player tracking system developed by a subsidiary company of STATS, Inc., is akin to Statcast in that it tracks player and ball movement. The Stats section of (linked above) shows various measures and animations gleaned from the SportVU data, but does not provide fans access to the raw data. This is the path I expect MLBAM to take. The batted ball data that has already shown up in Gameday is like this, and many of the other metrics that have been teased via broadcast, such as route efficiency and perceived velocity, could also be distributed in this manner.

Releasing the data in a summarized or snapshot form isn’t as risky to the teams, who were not all that happy when pitchF/X data made its way into the open world. Allowing public researchers to make insights based on that available to all teams took away an opportunity to gain a competitive advantage. This is why the other Sportvision products, like hitF/X that also provided batted ball information and commandF/X that tracked where the catcher’s glove was position, have been available to teams but not the public.

Regardless of what form the data takes when it is released, Statcast data should enable saberists to use more granular data to show what it takes to succeed in the game of baseball. Some of these data-driven discoveries may merely affirm what scouts and those in the game have been taught and believed for years and decades, but I’m sure some will not. Like many others, I can’t wait to get my hands on it.

2015 SABR Analytics Conference Research Awards

Voting closed on President’s Day for this year’s SABR Analytics Conference Research Awards, and like last year, I have taken a great interest in seeing which articles were nominated. Although the voting is closed, I once again am sharing which articles I voted for and runners up in each category.

Contemporary Baseball Analysis: Harry Pavlidis and Dan Brooks, “Framing and Blocking Pitches: A Regressed, Probabilistic Model,” Baseball Prospectus, March 3, 2014.
This category was stacked. I could have reasonably voted for 4 of the 5 articles. But Pavlidis and Brooks managed to stand out above the rest by a hair. Like Max Marchi’s winning article from last year, this is another landmark addition to our statistical understanding of catcher framing, possibly the hottest topic in sabermetric research until the StatCast data sees the public light of day. While Jonathan Judge and this duo have already updated and improved on their work, its import to quantifying catcher framing was without equal in 2014.
Runner up: Jon Roegele, “The Effects of Pitch Sequencing,” The Hardball Times, November 24, 2014.
Pitch sequencing is my current favorite topic in sabermetric research. It’s not quite as popular as catcher framing because sequencing is largely dependent on the pitcher’s arsenal and the techniques needed to study sequencing tend go beyond basic data mining. Jon’s work is the best on the topic that doesn’t require an understanding of Markov chains and/or the mathematical mechanics of game theory.
The other 2 articles I almost voted for were:

  • Russell Carleton, “N=1,” Baseball Prospectus 2014: The Essential Guide to the 2014 Season, January 2014. Pizza asks what we really know about an individual player, and explores swing rates for individual players using regression. (Yes, I’m one of those who instantly started mouthing GLM, HLM, and MLM at the words “gory math” and “regression” in the article.)
  • Jeff Sullivan, “Alex Gordon Barely Had a Chance,” FanGraphs, October 30, 2014. The best breakdown of the most scrutinized play of this year’s World Series.

Historical Analysis/Commentary: Steve Treder, “The Strikeout Ascendant (and What Should Be Done About It),” The Hardball Times Baseball Annual 2014.
A tough category to pick, but Steve’s breakdown of strikeout eras in baseball history was an exploration reminiscent of a Bill James essay do in his 1980s Abstracts. He explores strikeout rates rates through history, citing that the increase is part of a natural rise of the power game in baseball, both at the plate and on the mound. Nothing, not even a proposal to lop off the bottom three inches of the strike zone, will change the minds of batters sacrificing discipline for power or pitchers trying to keep that power in check by throwing hard at the expense of in-game longevity.
Runner Up: Bryan Soderholm-Difatte, “The 1914 Stallings Platoon: Assessing Execution, Impact, and Strategic Philosophy,” SABR Baseball Research Journal, Fall 2014.
While platoons aren’t anything new, I always find it interesting when someone looks at a season in the distant past using modern tools. Bryan’s analysis of the 1914 Stallings was well thought out and about as comprehensive as such an analysis is capable of being.

Contemporary Baseball Commentary: Lewie Pollis, “If You Build It: Rethinking the Market for Major League Baseball Front Office Personnel,” Brown University, senior honors thesis, Spring 2014.
Most senior theses don’t make it beyond the adviser’s desk. If you happen to read one, it’s probably because you know the person who wrote it or you were in the person’s grauduating class and major while they wrote it. Lewie’s thesis is clearly more pubic than that. It’s also an extremely articulate breakdown as to why wages for lower-level front office personnel should be higher. It won my vote in a rout.
Runner Up: Eno Sarris, “Learning the Language of the Clubhouse,” The Hardball Times, March 13, 2014.
Eno’s article was full of wonderful anecdotes and personal reflections on speaking the ballplayer’s language. It’s the runner up almost by default, as the other 3 articles rehashed (or completely missed) ideas I have previously seen explored.