Chicago Goes Solo for SABR Day 2018

When SABR Day was instituted in 2010, a few chapters sought to make their event distinctive from their other meetings. In Denver, they play catch, rain or shine. Some chapters get outstanding ballpark events in their regions. And in the Midwest, Chicago and Milwaukee met half-way for a joint meeting.

For years, this caused a conflict for Chicago members whose rooting interests lay with the club residing at 35th and Shields. Go to SABR Day in Kenosha or to SoxFest in the city? For the avid Sox fan and baseball historian, this is no simple calculation. The leaders at be felt it was time this conflict be avoided.

Thus, a week before most of the society met for the official SABR Day, the Emil Rothe chapter gathered at the Nichols Library in Naperville to celebrate on the weekend in between the local teams’ offseason fan fests. Despite the challenge of parking, 35 seamheads found their way there to enjoy an afternoon of baseball chatter.

Batting leadoff was Aaron Nieckula of the Oakland Athletics. The Berwyn native discussed the types of things he tries to do in his current role as the minor league field coordinator for the organization. His other role as manager of the Vermont Lake Monsters (NY-Penn League) helps to ensure he gets to meet with most of the players that come into the As organization. After talking about these roles and the upgrades the As made to HoHoKam park, Aaron spent the bulk of this time discussing how the A’s develop players both on and off the field. I found the off-field aspects just as fascinating. The As have players create a personal leadership model to assess their own strengths and weaknesses, much like the programs offered by large companies. They also emphasize SMART (Structured, Measurable, Appropriate, Reasonable, Timely) goal setting for the players. He also shared one of the sheets they use to measure on field success in playing the game “The A’s Way”. It was a fascinating look at how one organization does player development, especially since the A’s depend on this pipeline of talent for its major league success.

After going behind the on-field scenes, the next presentation peeled back the curtain a small bit on the White Sox gameday production with Dave Marren, a producer of scoreboard trivia for the Sox and the proprietor of the Sox Nerd blog. The Kenosha native walked through some of his favorite tidbits that made it to the scoreboard and described the various elements that come up on the scoreboard and what he looks for in trivia and information that makes it to the scoreboard. Dave showed the early versions from the old scoreboard at the old Comiskey Park (Red for the away side, Blue for the Pale Hose) as well as the current look for the player facts, game notes, trivia, and other scoreboard features that occur each game. Some key themes of his work include an attempt to stay positive even when the team isn’t doing well, many of the in game features like 6th Inning Trivia, Game Notes, and the Sox Almanac are scripted before the game, and it helps to have a good rapport with the guys in the television production truck as they end up helping each other out with on-the-fly information related to what occurs in-game.

At this point, it was time for a break in the proceedings. After a few announcements about upcoming meetings for the chapter, some historical trivia questions were asked of the group. Here’s one of the questions that you can try and find the answer to:

Closing out the afternoon was Josh Nelson, host of the newly independent Sox Machine podcast, with his look to preview the 2018 Chicago White Sox. Using Dan Zymborski’s ZiPs projections, he showed the poor projections for the 2018 lineup, which features Eloy Jimenez as the 3rd most productive hitter in a line-up he’s not expected to be a part of until the All-Star Break at the earliest. Comparing the projections to the results of last 5 years showed the White Sox regularly falling short of expectations, even if only by a game as in 2014 and 2015. 2018’s projection isn’t much better, but the hope for the future with the Sox is real. Josh suggests that the core 4 for a playoff run are Moncada, Jimenez, Tim Anderson, and Luis Robert, the latter whom has impressed with his contact in offseason camps and winter ball. The key for team success is to expect these players, and hopefully some others, to produce at least 3 WAR to make the playoffs, as each of the last 5 playoff teams for the South Siders featured 3 players with at least that much production.

This year’s event was also broadcast live via Facebook, and you can find those videos and more information on coming events, including a July date to see the Kane County Cougars, on the chapter’s Facebook page.


A SABR Writer’s Day

3 members of the baseball media took a couple hours out of their Labor Day weekend to talk baseball and media with 15 members of the Emil Rothe chapter.

UPDATED: Now with videos!

Batting lead-off was Sun-Times beat writer Gordon Wittenmeyer. I can’t comment on any opening remarks he made, because I was a little late for an unusually prompt meeting start time. He covers the Cubs, so much of the Q+A I did hear centered on the struggles of the club this year in the wake of last year’s championship. A few highlights:

  • Having covered 4 different clubs (Seattle, LAnaheim, and Minnesota previously) a big issue in baseball that should have been solved 20 years ago is the language barrier with Spanish-speakers from Latin America. Too many organizations, even a few years ago, had the mindset that “baseball is the only language that matters”. Yet the experiences of Dennis, Ramon, and Pedro Martinez are a clear example that it isn’t. Clubs are coming around to realize this, but still make mistakes. The Cubs exemplified this with the fiasco of a press conference that occurred after they acquired Aroldis Chapman.
  • The closest the Cubs came to falling out of contention seemed to be the last week of June. In case you’re not inundated with Cubs talk as often as I am, that was the week the team took a 2nd visit to the White House and Montero was made an example of and released for speaking his mind. Good thing the NL Central never ran away from their talent level.
  • David Ross is missed somewhat in the clubhouse, but his absence is minor compared to the absence of the massively unifying goal that breaking the drought was
  • He lauded the organization on what it has done with the ballpark improvements, but was less keen on how the team has driven down property values to take over the rooftops and neighborhood around the ballpark.

Batting second, and for me the highlight of the afternoon, was Peabody award winner Julie DiCaro. I’m fairly certain that if you know her for one thing, it’s this video that won her said award. A former lawyer, she meandered her way into sports media through the explosion of the blogosphere. The now-radio host ignited a lively discussion on the usefulness of stats, discussing both her use as a member of the media and how the public consumes the information explosion that sabermetrics and now Statcast are producing. She also talked about the efforts she’s made to help women in or interested in sports media to network with each other, and opined on the possibility there will be a female GM in the next 15 years.

Wait, that’s it? My notes are way shorter for Julie than they were for Gordon, yet I said she was the highlight of my afternoon. Why? She talked less. Why did she talk less? With Julie at the podium, that lively discussion was very much an open discussion, with multiple people (myself included at a couple points) chiming in on particular stats and their usefulness. With Gordon, most everyone held to the Q + A protocol: someone asks, the speaker answers. Thinking back, I ask myself the question of what she thought of that difference in the dynamic and whether it was as she intended. Julie left before the meeting ended and I could ask her. You can watch the videos once their posted and opine on your own.

Anyway, back to the meeting, where one more speaker took the audience on a trip through the minor leagues. Emily Waldon writes about Tigers prospects for 20/80 Baseball and The Athletic Detroit. Her interest in baseball started with having 4 brothers and took off when the West Michigan Whitecaps moved to town in 1994. She started covering the Tigers minor league affiliates as part of the Bless You Boys blog, moving on to her current posts subsequently. Her visit became much more timely with the waiver deadline deal of Justin Verlander to Houston, allowing her to talk about the new acquisitions, the other prospects in the system, and Avila’s philosophy for the rebuild and player development. She also noted that the parents of minor leaguers greatly appreciate her coverage, whether they’re local, across the country, or from one of the Latin American hotbeds of baseball.

Things I learned at JSM 2016

Ed. note – I’m back, and hopefully for a good long while.

This year I found myself with 5 major conventions I was interested in going on within a 2 week span, covering my varied interests in baseball, stats, gaming, and religion. Alas, I could only afford to go to one, so I took advantage of the fact that the Joint Statistical Meetings were being held in Chicago, thus eliminating my need to pay for airfare.

JSM is the largest single gathering of statisticians in the world. The American Statistical Association, the prime organizer among the half dozen statistical societies that co-sponsor the event, always books the host city’s signature convention center in order to hold 6,000 attendees over the course of 6 days. This was only my second time attend, having previously attended the 2008 conference in Denver.

The conference has many different subjects being analyzed, from finance and risk to data modeling and visualization. But, with this being a blog that focuses on baseball, I’ll share 4 things I learned (and 1 thing I already knew) from the sports sessions.

1) The gap between academics and practitioners in statistics is narrowing
The ASA has a well-deserved reputation as an organization that favors academic pursuits in statistics. Its membership is predominantly employed in academia, and membership growth has not kept pace with the growth of the profession. Yet, it seems the explosion of data has led to more cross-over. One paper that was presented listed a FiveThirtyEight writer among its authors.

2) There are more ways to get into baseball than by studying it directly.
One of the presenters I was able to interact with has spent a lot of time looking at SportVu, the NBA’s player tracking data. This person is now slated to start working for a baseball team in the fall because of that work. You can probably guess that the position will focus on understanding Statcast data.

3) Sports teams aren’t looking for subject matter experts with stats knowledge; they want stats experts with subject matter knowledge.
At a panel discussion about stats in sports, both panelists that are currently employed by major sports teams noted that front offices are loaded with SMEs. They want people with stats backgrounds who can analyze the data well. The odds of anyone getting into a front office today in the same manner as Bill James is highly unlikely.

4) It pays to think about analogs from other fields.
Dan Cervone presented a model trying to value court space in the NBA. He built his model analogous to how real estate valuations are made. It’s the type of thinking that leads me to pay attention to JSM, and it also is, in my opinion, a requirement to getting the most out of the conference.

5) Ideas at JSM are starting points, not end points.
The fact that JSM does tend to emphasis statistical methods over results means many papers aren’t necessarily providing new insights into well studied issues, but new ways to analyzing the questions at the heart of the issue. Take this presentation on predicting outcomes of plate appearances. It uses a type of regression modeling designed to handle structured outcomes like in baseball. It may not provide any new insights into how baseball is played, but an idea like this could end up in the next great forecasting system.

Sabermetrically Gaming: Bottom of the 9th

In the first entry of this series, I looked at some of the math behind the most popular tabletop simulation of real life baseball, Strat-O-Matic Baseball. Today, I’m going to look at a new and simpler tabletop game, Bottom of the 9th.

The premise of the game is pretty simple in baseball terms. The game is tied. The home team is up to bat. The inning should be evident from the name of the game. The talent gap is wide between the two teams, and so favors the visitors that it is considered a miracle the home team even has a chance to win. It is also presumed the visitors will win should the game go to extras.

The game plays out the inning pitch-by-pitch. Each pitch is a turn in the game, and is resolved with 4 or 5 steps. The heart of the game is in the Stare-Down, where the pitcher picks an area of the zone to throw to while the batter tries to guess where the pitch is going. This is simplified into 2 choices: a red disc for height of the pitch (High or Low)  and a white disc for which half of the plate (Inside or Away). The batter gets some benefits for guessing right, while the pitcher gets some benefits when the batter guesses wrong.

After this, the pitcher makes The Pitch by rolling 2 dice, one which determines whether the pitch is outside the zone, inside the zone, or paints the corner, and a standard six-sided die for control, where higher numbers are better. This impacts the swing, where the batter rolls one standard six-sided die. The benefits granted to each player from the Stare-Down are applied here before comparing the results of the swing and control numbers.

The die rolls are the simple part to break down mathematically. The pitch dies shows a pitch outside the zone to occur 1/2 of the time, inside the zone 1/3 or the time, and on the corner 1/6 of the time. A ball is called when the Swing result is less than or equal to the Control result. The ball is put in play when the Swing result equals the Control result for a pitch on the corner or when the Swing result is greater than the Control result for a pitch in the zone. All other combinations result in strikes.Based on these dice alone, a ball in play occurs on 16.6% of pitches, a ball is occurs on 29.2% of pitches, and the strike occurs 54.2% of the time. 

How does this compare to actual major league rates? It’s a little off. In the 9th inning of games in 2015 (via Baseball Savant), balls occurred on 34.8% of pitches and balls in play occurred on 17.4%of pitches. This is because of the simplified system used in Bottom of the 9th, which values speed of play over simulated accuracy. 

Another example of how the game values speed is what happens on a Contact result. The first player to roll a 5 or 6 tips the result in their favor. This requirement can be modified by the player at bat or pitching, and it depends on speed of rolling a die as well as the players involved.

Clearly, this is not a real simulation. That’s not a bad thing. Bottom of the 9th was designed to appeal to board game fans as well as baseball fans. It represents two of the hotter trends in board gaming: it was funded via Kickstarter, and it’s considered a “micro-game”, a simple game that can be played in around 15-30 minutes. It’s a pretty good game, and definitely recreates the feel of the batter-pitcher confrontation. I picked up a copy from my local game store, but I now wish I had funded the Kickstarter.

The SABR 101 Project

One of the things I missed when I had to skip out of SABR 45 Saturday was the committee meeting for SABR’s largest research committee, Statistical Analysis. Unlike many of the other committees, the Stat Analysis committee didn’t have a group project to work on, in part due to the individual nature of most members’ research. A couple ideas were bandied about the meeting during SABR 44, but it took until SABR 45 to get one of those ideas off the ground.

A few weeks back, Phil Birnbaum, the chair of the committee and editor of the By the Numbers newsletter, announced that group project. The idea is to create a crowd-sourced list of key resources for helping newcomers to sabermetrics learn what has been done and provide to him or her the foundation for additional contributions.

There are plenty of books and articles which I could cite, so I’m going to start with the broad resources that cover multiple topics. That means it does skew towards books. They are listed in the order they came out of my head.

Before I get into my long list, I want to invite you, dear reader, to contribute your recommendations to this project. If you do so in the comments, I’ll be sure to pass them on.

  1. The Numbers Game, by Alan Schwarz. This book came up recently when Graham Womack of Baseball Past & Present and I discussed the importance of this book and a few other titles that will make there way onto this list as for which one we’d recommend first. We both agreed that this title is where we’d tell others to start. A fantastic history of baseball’s numbers, and the understanding of how a particular stat like batting average or OBP came to be is key to understanding any analysis with those measures.
  2. The Hidden Game of Baseball, by John Thorn and Pete Palmer. It’s over 30 years old, and it might be the most important book in sabermetric history. There’s a reason I started my sabermetric research database project with this book: it was The Numbers Game before Schwarz wrote his book with its concise history of baseball statistics AND it introduced the linear weights model to the world, which is much more of the mathematical foundation of modern sabermetrics than anything put out by the most famous name in the field.
  3. The Bill James Abstracts, both the annuals printed from 1977-1988 and the Historical Abstract (first published in 1986, revised and updated in 2001). For the many who grew up before Al Gore’s invention came to the masses, these books were how they were introduced to sabermetrics. Bill isn’t a statistician in the academic sense, but his understanding of baseball endows his analyses with tremendous insight.
  4. Curve Ball, by Jim Albert and Jay Bennett. I have a rare relationship with this book. I read it before I ever read anything by Bill James. It steered me from being a pure mathematics major in college to a statistics major, which is one of the 5 best decisions I have made in my life. So yeah, I hold this title in high esteem for many personal reasons. That being said, it might be the best book for helping aspiring saberists to start understanding mathematical statistics, which is essential to advancing the field.
  5. The Book, by Tango, Lichtman, and Dolphin. For many saberists, this is the modern treatise on the subject. Grounded in an understanding of Palmer’s Linear Weights system, they introduce wOBA and use it to explore every facet of the game.
  6. For online reference guides, the FanGraphs Sabermetric Library is my preferred site, as I consider to be the most complete. Neil Weinberg is also authoring weekly posts to explain the ins and outs of various metrics, helping keep the reference guide current with new research.
  7. The Best of Baseball Prospectus: 1996-2011 is a 2 volume set that is a compilation of the most important articles from the first 15 years of that sites’ history. This is essentially my proxy for the excellent writing on that website, including Voros McCracken’s article on DIPS Theory and Keith Woolner’s “Baseball’s Hilbert Problems“.
  8. Baseball Hacks, by Joseph Adler. The ability to analyze data is great, but it is useless if you can’t get data to analyze. While the book is somewhat dated, it’s a great introduction to many of the coding skills required to do sabermetrics efficiently in the computing era, and one I still find worthwhile to have on my shelf.
  9. SABR101x, the massively open online course at edX administered by Boston University and designed by Andy Andres et al. If you prefer a class-based method for learning sabermetrics, this is as good as you’ll find. There are tracks on the history of sabermetrics, statistics, SQL/R skills needed, and a build up to understanding some key metrics used by saberists.

One thing I want to keep separate from this list is SABR’s own Guide to Sabermetric Research, which was put together by the aforementioned Phil Birnbaum. His involvement spearheading this SABR 101 project is why I leave it out for now. I have a sense that it will be that guide that is updated as a result of this group work.

SABR 45: A Partial Review for a Partial Experience

Almost 2 years ago, I was sitting on my computer scrolling through Twitter when this appeared:

Yeah, I got a little bit excited when I saw that.

It was never a question of whether or not I would be attending the SABR Convention this year. Having a convention in your backyard has some benefits, the biggest of which is cost. Aside from the convention registration, I had my choice as far as how to get to and from the Palmer House and whether I wanted to sleep in a hotel bed or my own. With an infant crawling around my house, I chose to not book a hotel room (at approximately $200/night) and took commuter rail in and out of Chicago each day.

The downsides of my lodging and travel decision were twofold: 1) I didn’t partake in nearly as many hallway and bar conversations as I did last year, depriving me of what many consider the most fun part of the convention experience; and 2) it made it easier for other things to pull me away from the convention activities. Having to catch a train at 7 am to just make it to a day of events running from 8 am to 10 pm meant having to reconcile sleep with the train schedule. Then, family events cropped up on the weekend, making it unfeasible for me to go downtown Saturday or Sunday. While missing Sunday only cost me the Historic Ballpark Site tour, not being able to attend Saturday cost me half of the presentations and panels and most of the committee meetings I was interested in.

However, what I did attend and help with as a volunteer and member of the host chapter was quite fantastic. Wednesday is typically a travel and get acquainted with the city day. With minimal travel, I helped as a volunteer with registration and Cubs ticket distribution. As with past conventions, there as a tour of the host city. I skipped this year’s walking tour due to the aforementioned volunteer work, but Jacob Pomrenke put together a fantastic document highlighting the sites with baseball history attached to them as the tour traversed downtown Chicago. (If a KML file gets created for it, I’ll link to it here). After registration closed down for the night, I sacrificed the welcome reception in order to catch the train and be home.

Thursday was what I presume is a rare day in recent SABR Convention history. At no time did any attendees have to pick between different meetings or presentations, as it was a single program of events for the day. Cubs broadcasters Len Kasper, Jim Deshaies, and Ron Coomer graced the broadcasters panel in the morning, chiming in when moderator Curt Smith would let them do so. Many of Smith’s questions centered on the Cubs, and all three provided the level of insight that I’ve become accustomed to when I do tune in for Cubs broadcasts. This was followed by the annual business meeting, which showed the continued positive growth of the society but, unlike last year, revealed no final verdict on next year’s convention. It seems the society learned its lessons the hard way: Houston had a hotel location near a mall instead of the ballpark due to the latter option’s lack of availability after the 2014 MLB schedule was released; Chicago corrected for that by getting the ideal hotel location early, but ending up victimized by selecting the one weekend BOTH Chicago clubs were on the road. There is a tentative plan for SABR 46’s host next year, but it would be unwise to get excited for seeing a bobble head museum and the most colorful home run sculpture in MLB quite yet (never mind my own personal ability to attend next year). Thankfully, despite the lack of weekend games, the Cubs were finishing up a series with the Dodgers, so Thursday afternoon’s getaway day contest ended up being the convention game. It was entertaining simply because of Joe Maddon’s tinkering with the line-up every 2 innings or there about. Thursday night ended up being what I think was the biggest highlight of the convention (and perhaps a way of the national office apologizing for the schedule debacle): a concert in the Palmer House’s Grand Ballroom with the Baseball Project.The Baseball Project Rocks SABR45From left to right, Scott McCaughey, Linda Pitmon, Mike Mills, and Steve Wynn rocked the house with their songs about Harvey Haddix, Ted Williams,  Larry Yount, Big Ed Delahanty, and many others. Wisely, they opened with “Box Scores” of their album 3rd, which to me is the quintessential SABR song. It was pretty awesome. If you like baseball and rock music (especially R.E.M.), you’ll love this band.

After forgetting my phone at home Friday morning, I made it in time for the second group of presentations Friday morning, dropping in on Tara Kreiger’s presentation about Andy Coakley’s labor struggles with organized baseball. It was a fascinating story that I was unfamiliar with, but it exemplified the blackballing many early players went through when they complained about their contract. This was followed by 2 panels: one title Pitching Prodigies that featured Steve Trout and Joe Berton a.k.a. “Sidd Finch”, and an presentation by the 4 Letters on an upcoming project. The former was my favorite panel I attended, as Berton told the story of how he got involved in the Sidd Finch hoax perpetrated by George Plimpton and Sports Illustrated. Trout seem more subdued about his experiences, which I guess is to be expected from an 8th overall pick who did not have the career he expected to have. The latter was a “stealth announcement” about a project entitled “1927: The Diary of Myles Thomas”, which looks to chronicle the 1927 Yankees via “real-time historical fiction” storytelling. I kind of like the concept, but will probably wait and see what ends up being produced by Steve Wulf and Douglas Alden Warshaw. The presentation I saw after the panels was entitled “Aging Fan Base: Using Twitter to Develop a New Geneartion of Baseball Fans” and given by Allison Levin. Unfortunately, she didn’t get to many suggestions in her slides, as most of the time was spent looking at Twitter usage during the 2014 World Series. But she has a few avenues for further exploration that will hopefully yield some results, thought I have a sense that MLB might be ahead of her on doing this.

The morning block was followed by a tribute-filled awards luncheon. I skipped this last year, since my meal times were spent with my wife who graciously traveled to Houston with me. I’m glad I went this year, because I got a better sense of what this organization means to so many people. Tom Hufford couldn’t avoid breaking down as he eulogized two of his fellow Cooperstown 16 that founded SABR, Ray Nemec and Joe Semenick. Phil Rogers had it a bit easier in terms of emotions, but still had to encapsulate what Ernie Banks and Minnie Minoso meant to their adopted hometown. He did so, and did it well. After the banquet I took time to peruse the vendor room, which is a dangerous endeavor given the number of baseball books that are available for sale. My wallet came away only somewhat dented. The only committee meeting I attended was for the Business of Baseball, which gave an update on the Winter Meetings project (all years are being researched by someone!), the Team Ownership bios (4 of 30 done or in progress), and a reminder from chair Michael Haupert about the importance of examining the source of data in research, using examples from the pre-1983 salary database to show how what’s printed isn’t always accurate.

I then attended 5 more presentations between the committee meeting and heading home. In order:

  • David Kaiser questioned “What Makes a Dynasty?” He counted at teams who played postseason baseball in 3 of 6 seasons as a dynasty, splitting the analysis into 3 eras based on the postseason structure in place. He noted which ones were dominated by pitching and which ones weren’t. Most of the expected teams showed up where you would expect. The only bone I pick is that, based on the average winning percentage by era for the dynastic teams in the study, he said mediocrity was more prevalent today then it used to be. I think that’s just a function of his definition of dynasty.
  • David W. Smith, the Retrosheet president, updated his look at run scoring in the 1st inning, asserting that travel doesn’t seem to have an effect but that the number of runs the visiting team scores in the top of the 1st is highly correlated with the number of runs they allow in the bottom of the 1st. You can find his paper on Retrosheet’s site.
  • Zach Moser gave an oral presentation on how Cap Anson’s views on colored players in professional baseball were portrayed over time. While revered in his time, Anson’s racism became a hot topic while he was among the early players considered for induction into Cooperstown’s most noted museum. Anson’s racism was revisited as many of his team records for the Cubs were eclipsed by the aforementioned Ernie Banks, and Moser suggests that most modern apologists for Anson are deficient in their criticism.
  • John Burbridge examined “The Increasing Importance of Quality Starts” by mostly just doing an x-ray on the definition of a quality start. He ultimately came to the conclusion that 6 IP with 3 or fewer runs allowed is reasonable, and claims that is it increasingly relevant as bullpens are utilized more and more.
  • Finally, Bruce Allardice talked about how pro baseball became a big part of Chicago in the mid 1800s. Baseball grew in popularity in Chicago, paralleling the game’s growth in popularity nationwide. By 1870, the city’s elite coveted the status of being the nation’s pork capital, vying against a river town called Cincinnati. Because of this rivalry with the 2015 All Star Game host city, Chicago’s wealthy pooled funds to found the first professional club in the City. The White Stockings did manage to beat Cincinnati twice late in that season, and would go to claim the championship based on a disputed victory over the New York Mutuals, who also claimed the title. Unfortunately, baseball took a 2 year hiatus after a cow tipped a lantern and ignited a magnificent blaze that required years of rebuilding.

I’d love to say more about SABR 45, but (1) I’m already at 1,750 words if you’ve read to this point and (2) the downside of a local convention is that you can be pulled to do other things since you aren’t travelling. That’s what happened to me on the weekend, as family event popped up and hindered by ability to get in and out of the city. I don’t know if I’ll get to go to another convention for a while at this point, and next year looks doubtful regardless of location. When I do go again, I’m going to make sure of 2 things: I’m staying at the hotel so I can go hang at the bars and talk baseball over beers. That’s the convention experience that I missed, and why those who go to one convention try to make it an annual trip.

Statcasting Expectations

The next level of public baseball data has arrived. MLB Advanced Media’s Statcast made a hyped television debut, although it had made cameos in online replay videos last year. With the system installed in all 30 ballparks to track all movement on the field, hopes are high for discovering many things about the game via data that previously could only be imprecisely discerned by watching a lot of baseball.

However, while MLBAM have stated that Statcast data will be made public, it is still unclear what types of data and how much of it will be available for public use. Bits and pieces of the data have slowly appeared as the 2015 season started. Among the first pieces have been the velocity and angle of the ball off the bat, which the savvy scrapers, such as Daren Wilman of Baseball Savant fame, of the Gameday files have captured and published. But whether the public will have access to the raw data remains to be seen.

It seems unlikely to me that there will be public access to the raw Statcast data anytime soon. The first challenge is the sheer size of the data set, which is already measured in petabytes. This is unlike the pitchF/X data, which can be scraped and saved on a home PC. Raw Statcast data is best stored on a cloud server. While MLBAM is certainly using “the cloud” as the method for allowing the 30 teams to access the data, it would be a massive security risk to open that server up to the public domain. Setting up a public server would be an additional cost, and it’s hard to argue that there would be any significant return on that investment for MLBAM. However, Statcast is already sponsored by Amazon Web Services, so the possibility is there for the raw data to be made public via the AWS platform. That possibility seems very remote at this time.

A more likely scenario (at least in my mind) for the release of Statcast data is something like what the NBA did with its SportVU data. SportVU, the player tracking system developed by a subsidiary company of STATS, Inc., is akin to Statcast in that it tracks player and ball movement. The Stats section of (linked above) shows various measures and animations gleaned from the SportVU data, but does not provide fans access to the raw data. This is the path I expect MLBAM to take. The batted ball data that has already shown up in Gameday is like this, and many of the other metrics that have been teased via broadcast, such as route efficiency and perceived velocity, could also be distributed in this manner.

Releasing the data in a summarized or snapshot form isn’t as risky to the teams, who were not all that happy when pitchF/X data made its way into the open world. Allowing public researchers to make insights based on that available to all teams took away an opportunity to gain a competitive advantage. This is why the other Sportvision products, like hitF/X that also provided batted ball information and commandF/X that tracked where the catcher’s glove was position, have been available to teams but not the public.

Regardless of what form the data takes when it is released, Statcast data should enable saberists to use more granular data to show what it takes to succeed in the game of baseball. Some of these data-driven discoveries may merely affirm what scouts and those in the game have been taught and believed for years and decades, but I’m sure some will not. Like many others, I can’t wait to get my hands on it.