Organizing the World’s Sabermetric Research, Part 4 – Designing a Database

Here’s an idea of how much other stuff has gone on in my life: I talked about building a sabermetric research database 11 months ago. Version 1 has yet to be published. Much like my postings here at this blog, the time to work on this project has been sporadic. That inconsistency made designing the database challenging.

While I did pick up a minor in computer science while an undergrad, I’ve been primarily a user, rather than a designer, of databases ever since. I know the basic principles of database design, but designing one with minimal experience from years ago is not easy. So I looked for examples.

I started with the best example of a database I knew of for recording information on various printed and recorded materials: the digital library catalog. I couldn’t get access to the database schema that a real library uses, but did manage to find an example. Granted, this example of an entity-relationship diagram covers only books, but it was a start. It affirmed 3 different base tables that were pretty obvious to me based on what I wanted when I first talked about the design: author, book, and category. The intermediate link tables between book and author and book and category were something I didn’t have in mind at first, but incorporating those kind of link tables for the underlying database is actually a key element of a third normal form relational database. The link tables will help with database organization.

I also found the schema for a database that served as an inspiration for this idea. As a statistician with a slight academic bent working in industry, one of my resources is the Current Index of Statistics. While their schema wasn’t displayed in a nice entity-relationship diagram, it is available in code form. Of course, there are many things the CIS is interested in that I am not, but the schema follows the core idea of third normal form: each element of the data needs its own table.

All this matters because I want to make sure I record all the information for the DB with a few passes through the material, and knowing which pieces of information to collect is critical to that process. A few of the elements I want to collect are universal to all of the material types I discussed in Part 2, with a few notes on the columns

  • Author – first and last name, with a key built using the same logic as the Retrosheet player ID
  • Publisher – name and city. The name could be the key, but I think creating a shortened version of the name will be a better key and make queries easier.
  • Citations – The heart of this project. Just a listing of two publication IDs, one being the piece of research cited in the other. At one point early on, I considered including page numbers, but that seems to be more effort than it’s worth at this point and could be added later.
  • Subject – The subject list needs to be uniform across all media types. The subject table will be like the Citations table, with a publication ID and a column identifying the subject. I’m thinking that the subject list will be coded to help conserve disk space as this database grows. Players and teams can be included as subjects, and I’ll use the same codes as Retrosheet.

The other tables are specific to different media types:

  • Book – publication ID, title, author IDs, publisher ID, publication year, ISBN. Books only published electronically will be treated the same as printed books. Publication IDs will start with “b” to denote book. ISBN would be the key for this table if it weren’t for the need for a unified key across the other media types that can’t be identified that way.
  • Article – publication ID, title, author IDs, journal ID, publication date, start page, end page, URL. This should work for both journals and magazines. Publication IDs will start with “a” to denote article. I’m also including URL since so much of what’s in print is migrating to or simultaneously published online nowadays.
  • Journal – journal ID, journal name, publisher ID, domain URL. Magazines are included here as well; journal ID is just so that the field name is distinct from other fields in the database.
  • Presentations – publication ID, title, author IDs, speaker IDs, presentation date, conference ID. I separate out the speaker and the author IDs only because not all authors will present and, in rare instances, someone else presents who didn’t the author the presentation. Publication IDs will start with “p”
  • Conference – conference ID, conference name. I’m not going to list each annual conference separately, as that can be inferred by the presentation date when these two tables are linked. This will just be to identify different conferences and conventions (e.g. SABR, JSM, NESSIS, SaberSeminar, etc.)
  • Web articles – publication ID, title, author IDs, website ID, publication date, URL. The web is a nebulous place, and the article I read today may be different than what I read tomorrow, but reputable online sites will note original publication date and edits if they occur, so I’m not worried about that as an issue. Publicaiton IDs will start with “w”
  • Websites – website ID, website name, domain URL. Pretty straight forward. I don’t want to combine this with the Journal table so that it uses few columns

If you’ve stuck with me this far, I’m going to add one last note about how I’m building this database. I’m breaking up my exploration into 3 eras to help with identifying and finding sabermetric research. The first era ends with the publication of The Hidden Game of Baseball. That’s my starting point for this project and that book marks a pretty significant milestone in sabermetric history. I also feel that going backwards in time from 1984 will be of more value to the sabermetric community, and it will allow me to focus on printed material initially. The second era is the period between 1984 and 1996, which is mostly printed material. 1996 is the year Baseball Prospectus was founded, and it serves as a proxy for the start of the explosion of sabermetric research courtesy of the Internet. The Internet era (1996-present day) will be handled last.

Version 1 will hopefully be ready in the next few months, and if it isn’t published by the start of SABR 45 at the end of June, this project will have been abandoned.

SABR Day 2015

It might have been a week later than the official date, but avoiding fan fest date conflicts proved to be a wise decision for the Ken Keltner Badger State and Emil Rothe chapters of the Society for American Baseball Research. 48 baseball fans made their way to and from Kenosha’s “world famous” Brat Stop for what has become an annual Hot Stove tradition.

This year’s meeting opened in the sadness that could only be brought on by the death of a beloved ballplayer. Not only was Ernie Banks’ funeral playing on the TVs as baseball fans arrived, but the meeting also took place on what would have been Mr. Cub’s 84th birthday. Rich Schabowski (dressed in football attire of teams not from the state of Wisconsin or Illinois) opened the meeting and led all in attendance in a moment of silence for the Cubs legend, which ended with a shout of “Let’s Play Two!” That call ended up symbolizing the day: each chapter organized half of the meeting, with a lunch break in between. It really was like playing a doubleheader.

The portion of the meeting organized by the Chicago chapter took the top half of the program. Leading off was guest speaking Ozzie Guillen, Jr. Currently employed as a financial adviser, he worked in the clubhouses while his father coached in Atlanta and Florida and also while Ozzie, Sr. managed the Pale Hose to their first championship in 88 years. Holding high expectations for both of Chicago’s clubs in 2015, he spoke his mind on two issues in baseball today. The first is that the pendulum has swung to far in favor of analytics and sabermetrics within some organizations. The second is that today’s players make too much money, exacerbating the disconnect between the players and the fans and mirroring the current stratification of American society. He then took many questions from the audience, discussing everything from his favorite player as a clubhouse manager (“The best tipper”) and observations on the aforementioned World Series champion 2005 White Sox to pitch counts and broadcasting the team his father managed in Florida.

Batting second was a man whom his boss has called the “Ben Zobrist of Baseball Prospectus”, prospect writer Mauricio Rubio, Jr. With a deep love of baseball inherited from his family and dreams of being a pro scout, Mauricio started working for the fantasy side of BP before his constant pestering finally landed him a chance to write on prospects. With a focus on the Midwest League, he commented on how his writing tends to focus on melding stats with scouting, in line with BP’s brand as a leading sabermetric site. He also remarked about how mechanical analysis has become big with saber-scouts, but cautioned that mechanical analysis might be overemphasized, concurring with some of the commentary on the importance of a prospect’s character from Ozzie Guillen, Jr. The Q+A revealed his typical day at the park starts with a focus on 2 pitchers (typically the starters) and 2 hitters, moving around from the bullpen to behind home plate to a side view for the hitters and a rear view to better analyze arm action.

The final speaker before lunch was Merle Branner, who shared a paper from a leadership course she took as part of her studies in Library and Information Science. The paper examines the leadership dynamic between Branch Rickey and Jackie Robinson using the Servant Leadership model proposed by Robert Greenleaf. She examines all 10 aspects of the model in relation to Rickey’s signing of Robinson and integration of the major leagues. Once she was done, it was undoubtedly time for lunch.

While lunch was delicious (the cajun bratwurst is highly recommended if you’re ever able to stop at the Brat Stop), there was more baseball to be discussed, and the Badger State portion of the meeting commenced. Jim Nitz told the story of the Milwaukee Chicks, the 1944 champions of the All American Girls Professional Baseball League made famous by the film A League of Their Own. Their only year in Milwaukee was a turbulent one despite the on-field success. Media coverage for the team was poor in Milwaukee, failing to replicate the success of teams like the Rockford Peaches and leading to multiple nicknames used in the papers (primarily Schnitts and Brewerettes). The Chicks cohabited in Milwaukee’s Borchert Field with the Brewers (the minor league club), leading to a cavernous stadium that was sparsely inhabited. Nitz noted their success was largely due to some fantastic ballplayers like Connie Wisniewski and Hall of Famer Max Carey’s well-regarded management of the team, and also shared anecdotes on each of the players. His Q+A was enhanced by some women who play in an AAGPBL re-enactment league.

Afterwards, it was time to close the silent auction (a Chicago chapter fundraiser) and draw the winner of the 50/50 raffle (a Badger State chapter fundraiser). After claiming items from the silent auction, a presentation on Ginger Beaumont was up next up. Unfortunately, it was at this point when I had to leave, so I can’t comment on the rest of the meeting.

Thank goodness Spring Training was only 2 weeks away.

For those that failed to make it, Emil Rothe chapter secretary David Malamut took photos and even video of the day’s events. The photos can be seen on Twitter @sabrchicago, and links to the videos can be found here

Mr. Cub

Somehow, thanks to my slower-than-a-tortoise pace in getting some research articles written for posting here, this is post #14 for this blog. The previous post focused on the last #14 for Chicago’s South Siders. Yet, for many of the North Side partisans, #14 will always be associated with Ernest Banks.

Needless to say, his death was a surprise.

Perhaps the defining characteristic of a Cubs fan is his or her boundless optimism that, some day, some way, some how, their beloved nine will find a way to win the last game played in October. It comes as no surprise that their most beloved players share this trait, and the moniker “Mr. Cub” was bestowed on the man who radiated that hope each and every day since September 17, 1953.

For someone born well after Ernie Banks stopped playing, most memories of the man come from replays and interactions with him as an ambassador for his beloved Cubs. Perhaps it is fitting, then, that a song at a concert epitomizes the man for me.

I’m a White Sox fan. I still played it twice.

Rest in peace, Ernie.


There once was a baseball. This, however, was not just any old baseball. This baseball had participated on the biggest stage it could. It was hurled by a large man at 97 MPH. It did not make a lot of contact with the refined sticks of ash used by those who attempted to hit it. It never left the infield, until it vanished.

For 3 days it went missing. Many speculated on where it may have disappeared to. How could a ball that significant be unaccounted for? Surely it wasn’t left in a room 925 miles away, soaked by alcohol. Someone had it, for this baseball was too valuable for someone not to have.

Those who watched the baseball’s last known appearance knew in their hearts where the baseball was. Many of them focused on one man, the last known possessor of the ball, a player. And, after 3 days, he made sure the man who paid his wages had that baseball in his hands.

9 years later, that player decided his time to step out of the spotlight had come. He didn’t get, want, or need an elaborate farewell tour. But the man who paid his wages made sure that, when it was time for the player’s team to honor him, all the stops would be pulled out.

And that is how Paul Konerko got a statue in left field, his World Series grand slam baseball, and a retired number from Jerry Reinsdorf.

Rebalancing the Schedule

Ed. note – First post in a long time due to many a thing happening in my personal life. Thanks for coming back!

One of the more challenging aspects of modern sabermetrics is the unbalanced schedule. This imbalance began in 1997 with the introduction of Interleague play during the regular season, and the balance was tilted further when MLB decided to put an additional focus on divisional play and have 19 games a year between teams within each division. An additional wrinkle was added last year when the Astros, as a condition of their sale to Jim Crane, switched leagues and caused the AL and NL to have an odd number of teams.

Let’s try to rebalance the schedule. I’m going to make a few assumptions:

  • The teams will stay in their leagues
  • The possibility of expansion or contraction of the leagues will be ignored
  • The schedule will remain at 162 games
  • No 2 game “series” will be allowed

It’s fairly simple to see that, under these restrictions, a truly balanced schedule across both leagues is impossible. 162 is not divisible by 29, and with 15 teams in each league, interleague play is required. The closest possibilities to a balanced schedule would each violate at least one of my assumptions: 162 divided by 29 is approximately 5.6, so having one team play each other team 5 or 6 times would result in 145 or 174 game schedules, respectively.

The next closest thing to a completely balanced schedule across both leagues is to try and keep the number of games played against each other team as close as possible. Additionally, it makes sense, for both logical and historic reasons, to make sure that a team plays more games within its league than outside of it. Let’s look at a scenario where a team plays each team from the other league 4 times. This leaves 102 games against the other teams in the same league. The schedule could then be completed with an almost balanced intraleague schedule: 7 games against the 10 teams in the other 2 divisions and 8 games against the other 4 teams within the same division.

That schedule actually could work out pretty well. With 15 series against teams from the other league each year, each team could alternate home and away each year, playing 8 interleague series at home one season and 7 the next. This impacts how the 7 games against the teams in the other 2 divisions within the same league would be split. You can’t have the intraleague ideal of playing 5 teams 4 games at home and 3 games on the road and 5 teams 4 games on the road and 3 at home. In the season with 8 home interleague series, only 1 of these series would be played with 4 games at home and 3 on the road. This would get reversed in the season with 7 home interleague series. The 2 teams impacted could be changed every 2 years, resulting in a 10 year cycle for this schedule scheme.

There’s still a bit of imbalance in the schedule, but I feel it’s more balanced than what MLB currently uses. It also makes sure fans who follow their teams would have a chance to see all of baseball’s stars, making the players, and the sport by extension, more marketable. An additional bonus is the hype for those storied intradivisional rivalries is more justifiable (looking at you, Entertainment and Sports Programming Network).

Mr. Manfred, my phone line is open if you’d like to discuss.

Organizing the World’s Sabermetric Research, Part 3 – Plugging into SABR

It’s been just about a month since SABR 44. As I teased in my very verbose recap, a few things came out of the committee meetings that relate to this possibly quixotic quest I have to catalog all the world’s sabermetric reasearch.

The first thing to note is that SABR already has an effort to catalog all baseball writing: The Baseball Index. Clicking through that link will show a functional, but dated and incomplete, reference of baseball documents, recordings, and other materials that any baseball researcher could want to know of. It does include many sabermetric works already, but finding these works isn’t all that easy. This is mostly because of an fickle search function that doesn’t work quite as well as a modern internet user would like, but also because the tags on the articles are designed around indexing all baseball research, not just sabermetrics. Most sabermetric entries are listed with the tag of “statistical analysis” and nothing deeper, unlike the articles on Saber Archive. Another thing of note is its current state of incompleteness, which is due to a broken data entry system. I will note here that the committee did mention someone is working on an upgrade to this system at SABR 44 (hint: he runs a very popular website).

Secondly, the Statistical Analysis committee, in its 7 AM Friday morning meeting at SABR 44, brought up the idea of a group project to create a centralized reference list for sabermetric research. Many members of the committee had various ideas about what such a resource should look like: a list of the most recommended articles, a full literature review of one area of sabermetrics (e.g. defensive metrics), working with the Baseball Index Project committee, and a wiki were all suggested. Phil Birnbaum, who chairs the Stat Analysis committee, is currently collecting names of those interesting in helping with this committee project, even if you’re not a member of SABR.

How do these two things affect what I had in mind? The Baseball Index, when upgraded and if designed better than its current state, would contain a lot of the features I am working to include in my database. There are a few features I plan to have that are not in TBI, most notably a citation link between works and topical tags that are more like Saber Archive’s. Thus, my database and the Baseball Index should be able to co-exist. I’ll be using TBI as an additional source for locating sabermetric research works to be included. I’ll also be contributing to the Baseball Index, focusing on articles in academic journals, which appears to be a major gap in their listing at the present time.

The Statistical Analysis committee project is too new to really know how things will shake out. I’m already on board to help with this committee project, and there’s a non-zero chance I take a leadership role with it. However, there are just too many unknowns to really know how my work will fit in with what this project turns into. All I can do is keep on keeping on.

SABR 44: 3,000+ words on my first convention

In the 8 years since I joined SABR, I’ve been to a few dozen chapter meetings, at times driving more than 2.5 hours just to get to the meeting locale. I served on a local chapter board, partially redrawing the chapter map for SABR. I’ve consumed baseball material since I was 6. Many times in the past, I had been asked if I was going to the upcoming SABR convention, and for many years, my answer would always be a disappointed “no”.

That answer changed about 8 months ago. With ample vacation time from my day job and not needing that time for other purposes, I was finally able to go to a SABR convention. I’m pretty sure I wasn’t the first person to register, but I definitely made sure to register as soon as I saw the notice that I could do so in the weekly SABR notes e-mail. And so it was that last week, I made my way to Houston for SABR 44.

A typical SABR convention is a mix of research presentations, player panels, research committee meetings, a ballgame or two, and seeing how many attendees can shut down the hotel bar. With a jam packed schedule, the baseball chatter starts early (around 7 AM on the earliest days) and goes well past midnight. Since I haven’t been blessed with the ability to be in two places at once, I’ll go over what I was able to attend, ranking things in order of my favorites as I go, and close with some general thoughts on the experience.


The biggest draw of any convention are the player and media panels. Thanks to SABR’s improving relationship with MLB and the Larry Dierker Chapter’s  close relationship with the Houston Astros and its chapter namesake, the panels here had a special twist: the last 2 were held Saturday afternoon at Minute Maid Park before that night’s game between Toronto and Houston. I’ll note those two in the ranks and comments below. I’m also including Reid Ryan’s opening keynote here in this section, since it is set aside like a panel in the schedule. For reference, I’m using the names of the panels as listed in the convention schedule, and have included links to audio/video where available:

  1. College Baseball Panel (audio/video) – The recordings don’t do justice to the hush that came over the room as soon as Roger Clemens entered and everyone noticed. I enjoyed the wide range of topics covered here (recruiting, bats, experiences) and the perspectives that Clemens, Mike Gustafson, and Lamar head coach Jim Gilligan provided.
  2. From Playing Field to Front Office – This ended up being more entertaining than informative, because at least 5 different Yogi Berra stories were shared. Dr. Bobby Brown is still sharp as a tack, Bob Watson was defiant when asked about his “struggles” against Don Sutton, and Eddie Robinson talked about his friendship with Joe DiMaggio and Marilyn Monroe.
  3. Reid Ryan keynote (audio/video) – Like most people who get a podium to themselves at a SABR meeting, Ryan told his story of involvement in the game and how he got to where he is today. What made him extra fascinating was hearing the perspective of a player’s son who has been involved at all levels of baseball.
  4. Decision Sciences Panel (at MMP) – For me, the most anticipated session. Moderated by and featuring Astros GM Jeff Luhnow, along with AGM David Stearns and Sig Mejdal (official title: Director of Decision Sciences. Real title: guy who has one of my 30 dream jobs). Lots of discussion about how the Astros front office works, and a little bit of insight into how they do things. Baseball Prospectus referred to as the “minor leagues” by Mejdal.
  5. Astros Player Panel (at MMP) – Larry Dierker, Alan Ashby, and Art Howe featured on this panel, and started swapping tales and interjecting into each other’s stories. A joy to hear multiple perspectives on the same story, especially the 1980 NLCS. Howe also commented on how he would have been involved in Steven Soderbergh’s version of Moneyball, and Howe talked to Philip Seymour Hoffman about his portrayal only after the movie premiered because of how Bennett Miller directed Hoffman to play the role.
  6. Colt .45s Panel (audio/video) – 4 players and a beat writers discussing the early seasons of Houston’s MLB franchise, with a lot of references to the heat, humidity, and mosquitos that made Houston the most interesting addition with the 1961 expansion. Featured Bob Aspromonte, Hal Smith, Carl Warwick, Jimmy Wynn, and Mickey Herskowitz. You can guess which one was the writer.
  7. Media Panel – This was the one panel that included someone without ties to Houston, and he is probably the most well known of the four panelists: Buck Martinez, currently working for the Toronto Blue Jays broadcasts. Writers Evan Drellich (Houston Chronicle) and Alyson Footer ( discussed the print side, while Martinez and Bill Brown (Astros TV play-by-play) discussed the TV side. It’s tough to rate this one seventh, which goes to show just how good the majority of these panels were.
  8. Women in Baseball panel (audio) – In a week where the role of women in sports was getting plenty of play in national media, this panel ended up with more of a media slant than what seemed to be intended. Marie “Red” Mahoney was the headliner, as the only women from Houston to play in the AAGPBL. Jana Howser talked about things from her perspective as the head of development for the College Baseball Hall of Fame. Alyson Footer and Laila Rahimi addressed the media issues.
  9. 1980 Houston Astros – What should have been a more interesting panel ended up as 10 minutes of panelist introductions and Tal Smith talking for almost 15 minutes with all the details of how that 1980 Astros team came together. The players and coach from that 1980 team, Enos Cabell, Deacon Jones, and Jose Cruz (Sr.), weren’t left a lot of time to tell their sides of the story. Jose Cruz looks like he could still swing a bat today.

One additonal note: many veteran convention attendees remarked that this was the first convention where question cards were used as opposed to open mics. This did expedite the asking of questions, as some SABR members tend to take a long time to ask a questions at a mic. However, it did seem to allow for the questions to be screened, which kept the questions on the panel topic but also likely enabled some of the potentially thorny questions for the panelists to be avoided. This process will likely be adopted for SABR 45, especially if it turns out that using the cards was a condition for some of the panelists to appear. (Here’s looking at you, Roger.)


The heart of the convention are the research presentations. Only 32 are given, each only supposed to be 15-20 minutes in length with time for Q+A towards the end of each 25-minute session. 2 presentations are given during each time slot on the convention schedule, meaning there’s no way to attend all of them. Below are the ones I attended, again ranked from my favorite to my least liked, with a brief recap of their findings as best as I could take notes on them:

  1. RP21: An Expanded Game-Theoretic Model of a Batter-Pitcher Confrontation in Baseball – Yes, it was as academic as it sounds. But game theory is something that fascinates me, and seeing Anton Dahbura’s model was interesting to me, as it was the first time I had seen a game-theory model account for the number of balls and strikes and the rate at which the umpire misses the call. A few assumptions made here to simplify things, such as pitchers being able to throw strikes on command, that aren’t practical in real life. In the 3-2 count example he went over, Dahbura advocated the pitcher throw a strike around 91% of the time, while the batter swing only 75% of the time.
  2. RP16: The Ballpark Sportscape: Outfield Advertising and the Branding Issue – Ed Mayo presented on behalf of co-authors Dobb Mayo and John Weitzel, looking at outfield wall ads in 8 major league parks with a panel of 6 interior designers, 2 ad industry pros, and 2 marketing consultants. Their most important take-away: the fan experience, not the baseball game, is the core product.
  3. RP25: Why Does the Home Team Score So Much in the First Inning? – Retrosheet founder David W. Smith noticed an uptick in 1st inning runs compared to all other innings, and just kept asking questions of the data. Based on what he showed, it appears to be some combination of the best hitters tending to be at the top of lineups, how long the away starting pitcher has to wait to throw in the bottom of the 1st,  and travel impacts, though no single metric that was used to investigate these impacts was found as a strong correlating factor. Interaction effects were not investigated.
  4. RP20: William Hulbert and the Birth of the Business of Professional Baseball – Business of Baseball committee chair, and UW-LaCrosse economist, Mike Haupert discussed William Haupert’s influence on the formation of the National League in the late 1800s. Many of the ideas he used to sell the owners on are now hallmarks of the modern American professional sports landscape: territorial exclusivity, fixed schedules (a problem at the time), and no admittance of  teams from “small towns”. This was also the winner of the award for best presentation at the convention
  5. RP27: The Strike Zone Squeeze – Richard Thurston explored the jump in BB in the AL between 1948 and 1950 (inclusive). He created a metric called WALA, Walks Above League Average, which uses a bit of a WOWY methodology calculate how much better or worse a player is at drawing walks. (The formula was too complicated to copy in notes in the time it was displayed.) Thurston theorizes it was an attempt by the AL owners to pressure umpires into calling a strike zone that would cause run scoring to go up and lead to better attendance.
  6. RP32: Was Mantle’s Peak Value Really Greater than Mays’? David Kaiser revisited Bill James’ articles in the Historical Baseball Abstract and the New Historical Baseball Abstract. Instead of using Win Shares, as James did in his updated comparison in the the latter book, Kaiser used Wins Above Average from Baseball-Reference, but substituted their fielding wins estimates with those created by Michael Humprhey’s in Wizardry. His results? Mantle and Mays are similar in their best seasons when you account for league difficulty, but Mays had more great years than Mantle did. And I have another book to add to the reading list.
  7. RP10: “The Biker Boys Beat the Boy Scouts”: Facial Hair and the 1972 World Series – Maxwell Kates, a very colorful Canadian, examined MLB teams’ facial hair policies over time, using the 1972 World Series between the mustachioed Oakland A’s and the baby-faced Cincinnati Reds as a springboard. Kates comes back around to cite that year as a turning point for facial hair in the game, which seems plausible just from looking at players’ photos from the Topps card sets in those years. I especially enjoyed the point he made that MLB was marketing to families from 1962 in 1972. (As a postscript, it gave me great joy to see the mustache featured on the Reds’ 2015 All Star Game logo).
  8. RP23: An In-Depth Study of Team Chemistry in Baseball – SABR president Vince Gennaro presented some early findings from interviewing players and front office personnel, then tied it into teamwork studies on the business and military realms. Most of the saberists you meet will question whether chemistry matters at all, but yet all those in the game continue to insist that how players get along and interact matters. I agree with Vince that something is there, and it seems as though we’re just getting our arms around how to study it.
  9. RP04: Just a Little Bit Outside…: Drs. Nick Miceli and Tom Bertoncino attempted to use pitchF/X data to see if pitcher injuries could be better predicted. Their results show that prior injuries and mix of pitches thrown are the biggest keys. I’m a little skeptical, because despite consulting with many knowledgeable people, including Harry Pavlidis and Alan Nathan, they still decided to use the pitchF/X pitch type tags to classify pitches. Those tags are notably inaccurate in many cases.
  10. Poster presentations – It’s a shame they only had these up for an hour on Friday evening. The posters should have been up sooner, and hopefully will be in Chicago. Kudos to Evan Wassman for basically creating his own linear weights system without having read any of the previous work (he’s still in high school, so there’s time) and to Matthew Crownover and Dr. Jimmy Sanderson on being awarded best poster presentation for their work looking at roster construction.
  11. RP17: The Cuban Baseball “Defectors”: An Insider’s Full Revelation – Peter Bjarkman is one of the authorities on Cuban baseball, and gave a pretty solid overview of the origins of current wave of Cubans in MLB, what the change in Cuban regulations means for players on the island, and an outlook on the potential for future talent. The main takeaway for me: the talent levels in Cuba have dropped significantly in recent years, and opening up the borders too much would likely turn the Serie Nacional into a low-level minor league. Its ranking is more a sign of the quality of the other presentations.
  12. RP14: Lead Me Out to the Ballgame: A Study Investigating the Leadership of MLB Managers – I take lots of notes at SABR meetings during all presentations. So it says something when a presentation only has 3 lines filled in my papers. It’s nice that Dr. Howard Fero and Dr. Rebecca Herman have a leadership manual based on their interviews with big league managers, but all of the leadership traits they cite can be found in dozens of other leadership books. This would have been more interesting if they had tried to find a trait that was unique to managing a baseball team that isn’t required in other industries.
  13. RP07: Let Them Play! The Houston Astrodome, the 190s, and America’s Golden Age of Popular Culture – There’s a certain style of presentation that’s common when exploring baseball history. I like to think of it as show and tell: show a picture and tell a story with the picture on the screen. David Krell just tried to tell a story, positing the Astrodome’s central role in helping turn sporting contest coverage into the event model that Fox has used to excess. Led Zeppelin’s “Ramble On” played in my head as I left early to make it to another presentation.

This doesn’t even cover a couple of the presentations I wish I could have attended, which I either skipped so I could eat a midday meal or missed because of conversations going for a half hour after the previous presentation/committee meeting.


I attended 4 committee meetings, each with a little different structure and format. The Business of Baseball committee reviewed the status its current and recently completed committee projects. The Retrosheet meeting had a few presentations on investigations into data discrepancies. The Bibliography committee discussed updates to The Baseball Index system and the need to track a number of publications for baseball-related material. The Statistical Analysis committee discussed starting a committee project to create a centralized reference or bibliography. Given what I’ve already discussed here and here, there will be more on this last project to come.

Other Events

The biggest event of any convention is the trip to the local Major League stadium. You can always tell where the SABR group is seated at these games; just look for every other seat in a group of 10 rows or so keeping score of the game. For me the game was a first trip to some hallowed ground for White Sox fans, as Minute Maid Park is the site where the first World Series since 1917 was clinched for next year’s host city. The game itself had its share of interesting events as well: the first time in 10 years with the roof open for an August game, an inside-the-park HR that was confirmed by replay after Jon Singleton was called out at home, R.A. Dickey’s knuckleball, and a robbed HR that those of us in the right field mezzanine could only see by video replay.

The trivia contest is the nerdiest part of the convention. What’s impressive isn’t just the depth of knowledge that those who compete have, but how quickly the contestants can answer some of these questions. The most entertaining category had contestants  pantomime various batting stances and incidents in baseball history. My favorite was the best call to the bullpen by any manager in history: Ozzie Guillen’s signal for Bobby Jenks to come into Game 2 of the 2005 World Series.

The city history tour is a prelude to each convention. While it’s focus is aimed at exposing attendees to the area’s historical (and non-baseball) highlights, it does occasionally point out locations tied to baseball. Our tour in Houston highlighted the early years of the city’s history, wandering through downtown, the ritzy River Oaks neighborhood, the museum campus, the massive medical campus, and past the sadly neglected stadium in the Harris County Sports Center complex.

I didn’t attend the outing for the Sugar Land Skeeters game, the Awards Banquet, or the Historical Ballpark Sites Tour. Two of these I wish I had attended. I’ll let you guess which ones.


SABR 44 was 4 days of almost total immersion into baseball. Even more than a week later, I still wish I was there. Thankfully, the Internet has made the world a smaller place, allowing me to keep in touch with some of the fantastic people I met there. The baseball chatter that I engaged in with the likes of Graham Womack, Phil Birnbaum, Anthony Rescan, Andy McCue, Sean Lahman, Maxwell Kates, Chip Atkinson, Tara Kreiger and countless others  is really the heart of any SABR convention. It’s no wonder the hotel lounge was hopping most of the weekend, even though many reasoned the drink prices to be too high.

That being said, I didn’t spend as much time chatting there as I might have under different circumstances. My wife made the trip to Houston with me so she could get some R&R before her school year starts up again. Thus, I ended up spending most of my time after each day’s sessions were done with her instead of hanging around and talking to whoever happened to be lounging in the lobby. This will probably not be the case for me next year at SABR 45.

The schedule is jam packed. This is not likely to change soon, as the organization doesn’t want to extend the convention a day longer for a variety of reasons. I didn’t get to attend everything I wanted to, but still making it to 95% of what I wanted to do was still pretty good. That being said, it is the first convention or conference I’ve attended where events started before 8 AM. An extra day for the convention could help this, but that addition is not likely to happen for a myriad of reasons.

Since SABR 45 will be in my homeland of Chicago, this time constraint could be extra challenging. This is a rare occasion where the convention dates have been announced ahead of the 2015 MLB schedule being released, and the games for the MLB team(s) are centerpiece events. We on the planning committee will hope to have 2 games to schedule, along with the 32 research presentations, 8-10 panels, and other regular convention events. Houston was so well organized that it will be tough act to follow.

If you made it this far, thank for reading. Hope to see you at the Palmer House Hilton in Chicago, June 24-28, 2015.