Okay, here’s my shot at this “billy beane pay” blog post, trying to keep it real and conversational:

Alright, so today I wanted to talk about something I was messing around with – trying to figure out how Billy Beane, you know, the Moneyball guy, might think about player salaries. It’s just a for-fun project, but I actually learned a few things along the way.
First off, I started by grabbing some baseball stats. I’m talking batting averages, on-base percentages, all that jazz. I found a decent dataset online. It was kinda messy, gotta be honest, so I spent a good chunk of time cleaning it up in Excel. You know, deleting the rows that were missing key data, standardizing the column names, the usual.
Then, I jumped into Python. I’m no coding wizard, but I can fumble my way through a script or two. I used pandas to load the data, and then the fun began. I wanted to see if I could find some correlations between different stats and player salaries. Like, does a higher on-base percentage actually mean a bigger paycheck? I was trying to recreate that Moneyball magic, you know?
I tried a few different approaches. Initially, I just calculated simple correlation coefficients. It was interesting, but nothing really jumped out. So, I started playing with linear regression models. I figured, hey, maybe if I can predict a player’s salary based on their stats, I’m onto something.
The first model was… not great. I threw in all the stats I had, and the results were all over the place. It was pretty clear I needed to be smarter about it. I started looking into which stats were actually meaningful. Like, some of those advanced metrics, like WAR (Wins Above Replacement), seemed to have a bigger impact than just batting average alone.

So, I re-ran the models, focusing on a smaller set of key stats. And you know what? Things started to look a little better. The model’s predictions were still far from perfect, but the R-squared value – which basically tells you how well the model fits the data – improved noticeably. It wasn’t earth-shattering, but it was progress.
One of the biggest challenges was dealing with the different eras of baseball. A .300 batting average in the 1920s isn’t the same as a .300 average today. So, I tried to normalize the stats a bit, comparing players to their peers in the same season. That helped refine the results even more.
What did I learn? Well, a few things. First, baseball salaries are complicated. There are a million factors that go into a player’s value – age, position, marketability, even clubhouse chemistry. You can’t just boil it down to a few stats. But second, and this is what I was hoping for, stats do matter. You can get a reasonable idea of a player’s worth by looking at the numbers. Billy Beane was definitely onto something.
Would this model make me a good GM? Probably not. But it was a fun exercise, and it gave me a deeper appreciation for the data analysis that goes on behind the scenes in professional sports.