Okay, so today I decided to mess around with some MLB data, specifically looking at rain delays. I’ve always been curious about how often games get delayed and if certain teams or ballparks are more prone to it. So, I dove in!

First, I needed to find some data. I spent some time googling around, and finally found a website, it provides pretty comprehensive historical MLB game logs. I downloaded a CSV file, it’s pretty huge and covered several years of baseball games, that should be enough, I think.
I opened up the file with google sheet, just to take a quick look. Yep, it’s got all the basic stuff: date, teams, scores, and… a column for “delay”! Bingo! That’s what I need!
I created a new sheet and started importing the data. It took a little while to load the data, the file was like 50MBs!
Start Playing With the Data
Once I got the data loaded, I started to filter data with delay column. I filtered for any entries where the delay was greater than 0 minutes.
- I decided to check how many total games were delayed. A simple count did the trick!
- Then, I thought, “I wonder which team had the most delays?” So I grouped the data by team and counted the delays for each.
- I did the same thing for ballparks, grouping by the ballpark name and counting delays.
I also calculate the average delay time. I just used the `AVERAGE` formula on the “delay” column. It turns out the average delay is longer than I expected!

I visualized some data with some simple charts, created a bar chart showing the top 10 teams with the most delays, and another chart for the top 10 ballparks. The charts made it super easy to see the differences.
My simple conclusions
It was a fun little project! I learned a bit about how to grab data, clean it up, and get some basic answers. Now I know, which teams and ballparks to watch out for if I don’t want to sit around waiting for a game to start!