Alright folks, lemme tell you about this little side project I got myself into, messing around with data about Magic Johnson and the LA Dodgers. Sounds random, right? Well, stick with me.
It all started when I was trying to learn more about data visualization, and I wanted a dataset that was both interesting and manageable. I’m a big sports fan, especially of the Dodgers, so I thought, why not combine my love for baseball with data analysis? Then I remembered Magic Johnson’s involvement with the team, and a light bulb went off.
First thing I did was scrape data. I spent a solid afternoon digging through *, pulling down team stats, player info, all that jazz. It was a total pain, lots of manual copy-pasting at first, then I wised up and found a Python library to automate the scraping. Should’ve done that from the start, lesson learned!
Next up was cleaning the data. Oh man, the data was MESSY. Missing values all over the place, inconsistent formatting, just a general nightmare. I used Pandas in Python to wrangle everything into shape. Filled in missing data with averages where it made sense, standardized the date formats, and made sure all the column types were correct. This took way longer than I thought it would, probably a good week of evenings just grinding through it.
Once the data was clean, I started exploring. I wanted to see if there were any correlations between Magic Johnson’s ownership and the Dodgers’ performance. Did they win more championships? Did attendance go up? I plotted some simple graphs using Matplotlib and Seaborn. Lots of line graphs showing wins over time, bar charts comparing attendance figures before and after Magic’s involvement, that kind of thing.
The results were… inconclusive. The Dodgers were already a pretty successful team, and it’s hard to say definitively that Magic’s ownership had a direct impact on their win rate. But, attendance did seem to increase slightly after he joined, which could be attributed to his star power.
I also messed around with trying to predict ticket prices based on different factors like opponent, day of the week, and team performance. I built a simple linear regression model using scikit-learn. The model wasn’t amazing, but it was a fun exercise and helped me understand the limitations of the data I had. Needed way more data to make any real predictions, stuff like weather conditions, promotions, etc.
The most interesting thing I found was just how much the Dodgers’ payroll has changed over the years. Seeing those numbers visualized really puts into perspective how much money is in baseball. It’s insane!
Overall, it was a fun little project. I learned a ton about data scraping, cleaning, and visualization. Plus, I got to combine my love for baseball with some practical data skills. Not bad for a few weeks of tinkering!
- Scraped data from *
- Cleaned and transformed the data using Pandas
- Explored the data and created visualizations using Matplotlib and Seaborn
- Built a simple linear regression model using scikit-learn
Would I do it again? Probably. But next time, I’m starting with the automated scraping library FIRST!