Baseball Hall of Fame voting can drive the average fan crazy. Year after year, great players fail to be voted in unanimously, mediocre players receive votes, and long-retired players experience wild swings in the percentage of voters who name them on their ballot, despite not having played an inning, thrown a pitch, or swung a bat during the previous five to twenty years. How can we explain this nonsense?
The answer is, for the most part, we can't. All a fan can do is accept that Baseball Hall of Fame voting, conducted by the Baseball Writers Association of America (BBWAA), is a phenomenon unto itself. If we can't understand baseball Hall of Fame voting, though, maybe the next best thing is visualizing the data behind it. The set of interactive plots on this webpage is our attempt to do that. We were especially interested in two things: (1) viewing the trajectories of BBWAA vote percentage by year for different players throughout history, and (2) simultaneously viewing the career statistics of these players, to help find patterns and explain their trajectories (or to reassure ourselves that the writers really are crazy).
The main figure above is a plot of BBWAA Hall of Fame voting by year for all 1,070 players who have appeared on the ballot since Hall of Fame voting began in 1936. The circular points represent each player's vote percentage in his final year on the BBWAA ballot, and the lines represent his vote percentage in prior years if he appeared on the ballot multiple times. Recall that a player needs to be listed on at least 75% of the ballots in a given year to be inducted. If he gets less than 5% of the votes he is removed from future ballots, and if his vote percentage is between 5% and 75%, he stays on the ballot for at least one more year, up to a maximum of 15 years. (See Baseball-reference.com for a full description of the BBWAA voting rules).
A number of additional interactive tools help you select subsets of players to view in the main plot, including (1) the "Player Name" search box, (2) the two legends to the right of the main plot that encode each player's method of induction and primary position, and (3) the histograms below the plot, which allow subsets of players to be selected by their career statistics. There is also a scatterplot with dropdown menus for each axis that allow users to explore the relationship between any two statistics. We hope the interactive tools are intuitive; full instructions for interacting with the plots are available below, including instructions for how to share the URL of a version of the visualization that you created. We also describe the raw data and software used for the plots, and a few interesting footnotes and examples.
We're interested in seeing what you find with this visualization. If you tweet about it, we'd love if you used the hashtag #hofvis.
To browse the player voting trajectories, use the following:
As a general principle, clicking and brushing regions of the elements of this visualization will create queries which restrict the data in all elements of the visualization to lie within the selected region. For example, if you brush a region in the WAR histogram between values 50 and 100, and you select "SS" from the position legend, the other histograms and the main plot will be updated to reflect only the shortstops with career WAR between 50 and 100. When combined, these regions form logical intersections: we only show trajectories and histogram counts for players that satisfy all selection criteria.
Last but not least, the state of the visualization is encoded in the address bar. This means you can click back and forward like you would when navigating different webpages; if you like a particular state, you can copy the URL and share it over email, Facebook, Twitter, etc.
First, here are a few obvious selections, just for practice:
Here are a few more obscure sets of players:
Last, here are a few footnotes about the data and other random things we noticed while playing with and visualizing this data:
If you're having trouble seeing it, try Google Chrome (we develop on Safari and Chrome).
The histogram rendering is a trivial change of the demonstration code in Crossfilter; the selectable legend was written from scratch (it is also trivial). The history management code was written on top of jQuery's param and jquery-bbq's deparam, although the code has more than a few hacks in it. Bizarrely, these two functions do not form a bijection: $.param is not an injection from JSON objects to strings. WAT. We also use Underscore.js.
The BBWAA ballot data and player career statistics were downloaded from baseball-reference.com. We gathered additional data regarding Veterans Committee elections, Negro Leagues elections, and some player position data from baseball-almanac.com and baseballhall.org.