Messing with Hall of Famers’ Similarity Scores (Part I)

What to do when there no entertaining sports are on TV in February? Mess around with’s highly addictive similarity score tool, that’s what.

If you’re not familiar with this particular Bill James’ creation, it’s pretty simple. The actual methodology is at the link above, but basically two players start with 1000 similarity score and points are subtracted based on how dissimilar they are. The end result is a list of the ten most similar players to any given player on

A similarity score over 900 means that two players are very similar; the farther you get away from 900, the more tenuous a relationship is. Obviously many Hall of Fame players are unique, making the similarity score not all that valuable. For example, the extremely unique Rickey Henderson’s most comparable player is Craig Biggio, with a minuscule 713 similarity score.

So sometimes these scores don’t really tell us that much. That doesn’t make them any less fun as an arguing point. In this post today, I looked at recent inductees and players currently on the ballot. Tomorrow I will look at recently retired players on the next few ballots.

I have no doubt that this post will be an incoherent list of thoughts…but it should still be fun.


Roberto Alomar (elected this season): #1 comp – Derek Jeter (869). Alomar was a fairly unique player – it’s fairly surprising for a player to not have at least one player with over a 900 similarity score. But even with the low score, Jeter isn’t bad company as a #1 comparison. Five of Alomar’s top ten comps are in the Hall of Fame, including Hall of Fame second baseman Frankie Frisch, Charlie Gehringer, Joe Morgan, and Ryne Sandberg. Interestingly enough, the most comparable second baseman to Roberto Alomar is Lou Whitaker’s 857. Which leads me to…

Ryne Sandberg (elected in 2005 on his third year on the ballot)/Lou Whitaker (15 votes in 2001): #1 comps – each other (901). Whitaker’s lack of support (he didn’t even come close to the 26 votes needed to make it to a second year on the ballot) doesn’t make much sense given the fact that Sandberg cruised in on his third ballot. They were remarkably similar players. The rest of Sandberg’s top five? Joe Torre, future Hall of Famer Barry Larkin, Alan Trammell, and Ray Durham. For Whitaker? Trammell, Alomar, Buddy Bell, and Hall of Famer Joe Morgan.

Maybe Whitaker shouldn’t be in the Hall of Fame, but surely he deserved more than one year on the ballot and 15 votes. I feel like the fans should be able to vote to give the writers a mulligan when they do something stupid like that.

Bert Blyleven (elected in 2011 on 14th year on ballot): #1 comp – Don Sutton (914). Eight of the ten pitchers on Blyleven’s list are in the Hall of Fame. Only Jim Kaat and Tommy John are not. One of the many fun quirks of the Baseball Hall of Fame: the love for the 300 game winner. Six of the ten pitchers on Blyleven’s list won 300 games – they all cruised in to the Hall in five years or less. Sub-300 game winner Blyleven took 14 years to make it and Kaat and John were two of the rare players good enough to hang on the ballot for 15 years but not good enough to make it into the Hall. Strangely, Robin Roberts and Ferguson Jenkins both made it in fairly quickly despite finishing with fewer than 300 wins.

Tommy John (fell off ballot after 15 years in 2010): #1 comp – Jim Kaat (923). Funny how these comps tend to work out well. John and Kaat get grouped in together as pitchers that lasted a long time but didn’t have one outstanding quality. That they are each other’s #1 comp is perfect. Seven of John’s other nine comparisons are Hall members (not including future Hall member Tom Glavine). Seems like those comps plus having the most important surgery in baseball named after him should be enough to get him in.

Andre Dawson (elected in 2010 on ninth ballot): #1 comp – Hall of Famer Billy Williams (886). Somewhat unsurprisingly, fellow suspect Hall of Famers Williams and Tony Perez are Dawson’s #1 and #2 comparisons. That fits in with the consensus that Dawson isn’t a great Hall of Fame selection but not a catastrophic one either. Also on Dawson’s list? Dave Parker and Harold Baines, who were both recently kicked off the ballot. Which leads us to:

Dave Parker (lasted 15 years on ballot): #1 comp – Luis Gonzalez (907) and Harold Baines (lasted five years on ballot): #1 comp – Hall of Famer Tony Perez (943). Three of Parker’s and five of Baines’ top ten comps are Hall of Famers. Baines’ 943 similarity score with Perez is extremely high for Hall of Famers. Perez was a marginal Hall of Famer in the first place but got in because he was a member of the Big Red Machine; Baines didn’t have any great teams to fall back on.

The most interesting thing about these comps is Luis Gonzalez. Gonzo pops up on all three of these top ten lists. Maybe Gonzalez is too high because of his hitter-friendly era, but he matches up well with players that lasted multiple years on the ballot. I don’t see Gonzo making a second ballot, but maybe his numbers will look better with five years of retrospection.

Rickey Henderson (elected in 2009 on first ballot): #1 comp – Craig Biggio (713). Perhaps the most unique player of all-time. Even Henderson’s contemporary Tim Raines only scores a 648.

Jim Rice (elected in 2009 on 15th ballot): #1 comp – Hall of Famer Orlando Cepeda (912). Ironic that Rice’s #1 comp is Cepeda. Like Rice, Cepeda lingered on the ballot for 15 years. Rice was selected on 76.4% of the ballots in his final season to get elected; Cepeda’s 73.5% meant he came up short and had to wait five more years until the Veterans Committee elected him. Rice was a controversial selection and his comps make that clear: his top five consists of Cepeda; Andres Galarraga (22 votes in 2010); Veterans Committee selection Duke Snider; Ellis Burks (2 votes in 2010); and Joe Carter (19 votes in 2004). Which of course brings us to…

Andres Galarraga (22 votes in only year on ballot in 2010): #1 comp – Hall of Famer Orlando Cepeda (940). Amazing how eras change things. Three of Galarraga’s top ten comps are Hall of Famers, including the quite similar Cepeda and Rice. Rice made it in because we romanticized his career over time – he was elected a full 23 years after his last season. Perhaps Galarraga could have gotten there if we had 23 years to think about his career. Do I think Galarraga was a Hall of Famer? Absolutely not. I’m only noting that Rice jumped a whopping 275 votes over 15 years to make it in. Even though Cepeda came up short, he picked up a ridiculous 287 votes (from 48 to 335) that indirectly led to his election by the Veterans’ Committee. Galarraga came up three votes short of the 5% needed in his first year and now he’ll never be able to pick up the momentum needed to make the Hall.

Goose Gossage (elected in 2008 on ninth ballot): #1 comp – Hall of Famer Rollie Fingers (918). #2 is Hoyt Wilhelm. Certainly not surprising. But this leads us to…

Bruce Sutter (elected in 2006 on 13th ballot): #1 comp – Doug Jones (934). Gossage and Sutter are often grouped together. According to the similarity scores, that couldn’t be farther off. Gossage’s top two comps are the two most famous early relievers. Sutter’s comps are the decidedly less impressive Jones, Tom Henke, Jeff Montgomery, John Wetteland, Jeff Reardon, and Robb Nen. How did Sutter get elected when none of those other six got any consideration whatsoever? You got me. Maybe Sutter just beat them to the punch by being slightly above average at a new position years before the rest of those closers were slightly above average.

Wade Boggs, Tony Gwynn, and Cal Ripken (all first ballot): Nothing to argue about with these three. The top seven Boggs’ comps are Hall of Famers; nine of Gwynn’s top ten are Hall of Famers; and six of Ripken’s top ten are Hall of Famers. Interestingly enough, Ripken’s highest comp is Dave Winfield (789) of all players.

Barry Larkin (62.1% last year) and Alan Trammell (24.3% last year): #1 comp – each other (914). Larkin received 62.1% of the vote this year, his second on the ballot and seems primed to go in next year. Trammell received only 24.3% of the vote in his tenth year on the ballot and will linger for five more years before falling off completely. Poor Trammell and Whitaker dominated Detroit for over a decade and got no Hall support.

Interestingly enough, Larkin’s #2 and #3 comps are Edgar Renteria and Ray Durham. Renteria seems washed up at only 33, but it’s easy to forget how solid he once was. From 22 to 29, his top comp was Hall of Famer Robin Yount; four years later, it is Tony Fernandez. Durham has gone from Joe Morgan to Craig Biggio to Jay Bell – quite the career arc.

One of the many downfalls of the five-year waiting period is revisionist history. In today’s game, shortstops (and second basemen, to a lesser extent) put up ridiculous numbers. Derek Jeter, Nomar Garciaparra, and Miguel Tejada ushered in the era of the shortstop as a slugger. And that’s fine…but I think we subconsciously compare guys like Larkin and Trammell to shortstops of today. We forget just how great Trammell was in the 1980s and Larkin was in the early 1990s before this generation of great middle infielders came along.

Jack Morris (53.5% of vote last year): #1 comp – Dennis Martinez (903). Morris is the most polarizing player on the ballot. Statistically, Morris doesn’t have the numbers to make the Hall. But he has a reputation as a big-game pitcher, the “you had to be there” argument, and Game 7 of the 1991 World Series to fall back on. Morris’s supporters get the ugly Dennis Martinez as a number one comp. However, five of his top ten comps are Hall of Famers, led by number two Bob Gibson. Not bad company to keep.

Morris is the antithesis of the problem I mentioned above. Much of Morris’s appeal is based on emotion. He is the gamer: the guy who completed every game and fought for every pitch. Because of pitch counts, we just don’t have starters like that any more. Or so the story goes. We give Morris more credit than he is due because he was one of the last of this breed of pitchers. Whereas Trammell limps along on the Hall ballot partly because shortstops are so good in today’s game, Morris gains momentum partly because pitchers are not the workhorses they once were.

Lee Smith (45.3% of vote last year): #1 comp – Jeff Reardon and Trevor Hoffman (896). Smith’s #3 comp is John Franco (891). I have written about this before, but I don’t quite understand the lack of Hall support for Franco. Smith has 54 more saves; other than that, their careers are very similar. Smith was the best right-handed closer of his generation and Franco was the best left-handed closer. I don’t think Smith will get there, but eventually he may beat the voters into submission, like Blyleven and Rice. Franco received 4.6% of the vote and was gone after one year.

Jeff Bagwell (41.7% last year): #1 comp – Chipper Jones (887). Two of Bagwell’s top ten comps are Hall of Famers and his #1 is future Hall of Famer Chipper Jones. Of course none of this matters to the hopefully less than 25% of voters that will never vote for anyone that lifted a barbell in the steroid era. Which brings us to the similarly underrated…

Fred McGriff (17.9% last year): #1 comp – Willie McCovey (887). McGriff’s top four comps: McCovey, Willie Stargell, Bagwell, and Frank Thomas. Yeah, that’s good company. Poor McGriff will never get serious discussion even though he should. Had he hit seven more home runs (493), he almost certainly would be a viable Hall candidate. As it is, he will undoubtedly linger somewhere between 15 and 35 percent until he falls off the ballot in 2024. This is stupid.

Tim Raines (37.5% last year): #1 comp – Johnny Damon (887). The comically underrated Raines is slowly gaining Hall momentum. The only knock on Raines’ resume is that he played in the same era as Rickey Henderson. I already pointed out that Henderson is utterly unique in baseball history. Raines’ #2 comp is Lou Brock, who cruised in on his first ballot. Kenny Lofton rounds out the top three. Goes to show you that speedy leadoff men are the forgotten players of the steroid era – Damon is a Hall longshot and Lofton has almost no chance.

Edgar Martinez (32.9% last year): #1 comp – Todd Helton (912). Martinez’s Hall of Fame case gets no real help from his comparables, mostly because not many great hitters started their career at age 27. Helton, Will Clark, and John Olerud are his top three comps; like Martinez, all three are marginal Hall candidates.

Larry Walker (20.3% last year): #1 comp – Vladimir Guerrero (891). Thanks to a combination of the steroid era and Coors Field, Walker’s career is underrated. Four of Walker’s top ten comps are Hall of Famers, led by #6 Joe DiMaggio (871). Three more are potential Hall of Famers, including solid candidate Guerrero and borderline candidates Jim Edmonds (877) and Helton (859). But steroids and the stadium are two huge drawbacks to Walker’s candidacy. It would be one thing if Coors Field was located in a major metropolitan area, but the fact that he played in relative obscurity in Denver makes him entirely too easy to dismiss.

Mark McGwire (19.8% last year): #1 comp – Jose Canseco (801). Hall of Fame First Basemen Harmon Killebrew and Willie McCovey are also on McGwire’s top ten list, but really no one is similar to McGwire. Of the unique players I could think of off the top of my head, only Rickey Henderson and Barry Bonds had a #1 comp with a lower similarity score than McGwire. Which is strange because the knock on McGwire was that he hit home runs and nothing else. If that were true, then he’d be a lot more similar to any of the other dozens of players that hit home runs and nothing else. His comp list would look like Dave Kingman’s (which includes Greg Vaughn, Frank Howard, Rocky Colavito, and Norm Cash).

This is simply because McGwire was so much better than a typical home run hitter. He twice led the league in on-base percentage. His career OPS+ was 162. The last two first basemen elected on the first ballot were Willie McCovey (147 OPS+) and Eddie Murray (129 OPS+). Yet at this point it is pretty clear that McGwire won’t be voted in – his 19.8% in his fifth year on the ballot was his lowest total so far. But I suppose as long as we’re protecting the nebulous “sanctity of the game” standard, we better keep him out…not like anyone enjoyed the 1998 home run chase or anything, right?

Don Mattingly (13.6% last year): #1 comp – Cecil Cooper (933). Want to know a good way to piss off a Yankee fan? Tell him that Cecil Cooper is Don Mattingly’s most comparable player. If that doesn’t do the trick, tell him that Wally Joyner and Hal McRae are #2 and #3. That should do it. Although in fairness, all three of those guys probably would have stayed on the ballot for fifteen years if they were fortunate enough to play in New York.

Dale Murphy (12.6% last year): #1 comp – Andruw Jones (920). Talk about a perfect comp. Both Atlanta Braves center fielders. Both strong Hall candidates by the time they turned 30. Both longshot Hall candidates by the time they turned 34. Murphy had a secretary named Jones and Jones had a secretary named Murphy. The comparisons go on.

For the record, I personally would like it if Murphy and Jones were both in the Hall of Fame. I tend to prefer short greatness over extended very-goodness. Jones might be more of a stretch, but for a time in the 1980s Murphy was the best hitter in the league. Perhaps I just like Murphy because my dad still has several of his rookie cards in storage.

Kevin Brown (2.1% last year, off future ballots): #1 comp – Bob Welch (945). Orel Hershiser (935) and Hall of Famers Catfish Hunter and Don Drysdale (both 928) round out Brown’s top four. The pitching version of Dale Murphy only received 12 votes in his first year on the ballot. Granted, Brown’s ridiculous undeserved contract meant that he was never making the Hall of Fame, but the 12 votes are symptomatic of a weird catch-22 in today’s pitching culture. With Tommy John surgery, low pitch counts, and general health awareness, we expect pitchers to last forever. When they don’t, they are not considered viable Hall candidates, no matter how dominant they were over a short period of time.

In the past, a ten-year prime was sufficient for pitchers (Koufax, Drysdale, etc.). It was a given that pitchers’ careers could be short with how little was known about pitcher health. Hall of Famer Dazzy Vance is a perfect example. For ten years in between 1922 and 1931, he was a great pitcher. Before and after, he wasn’t even an average pitcher. That was good enough to make the Hall of Fame (though it took him fifteen elections). Three of his top six comps? Hershiser (925), Brown (924), and Welch (921). Hershiser lasted only two years on the ballot and Brown and Welch were both off after one year.

Rafael Palmeiro (11% this year): #1 comp – Frank Robinson (887). Palmeiro is another weird revisionist history case. Sure, he wasn’t going to get elected because of the steroid thing. But there was a lot of rhetoric before the election about how he was a borderline Hall case in the first place. That’s just preposterous. Six of Palmeiro’s top ten comps are Hall of Famers. The other four are Ken Griffey Jr., Gary Sheffield, Manny Ramirez, and Fred McGriff. If Palmeiro wasn’t suspended for a positive drug test, he was a no-brainer Hall of Famer, plain and sample.

Juan Gonzalez (5.2% this year): #1 comp – Albert Belle (897). Another perfect comparison. Both Gonzalez and Belle were in the discussion for best hitter in the league for a time in the 1990s and neither aged well at all. Like I said above with Murphy, I’m a sucker for these type of players. Of course, with Belle and Gonzalez, it’s a moot point since neither will be able to escape steroid speculation.

Tomorrow: Part II, where I look at recently retired players that will come on the ballot in the next five years.


3 Responses to Messing with Hall of Famers’ Similarity Scores (Part I)

  1. […] I looked at the similarity scores of a few select recent Baseball Hall of Famers and candidates currently on the ballot. Today I look […]

  2. Jimmy G says:

    I love that Edgar Martinez is comped with Will Clark and John Olerud. Those are three of the most pure swings I’ve ever seen in baseball.

  3. […] With no moving parts, solar panels in particular are low maintenance and on a sloping roof the rain will even help keep them For more on this topic you can read: Also you can check out this related blog post: […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: