Alright, let me tell you about this thing I did today. I was messing around with some tennis data, specifically trying to figure out what was up with the Monfils vs. Etcheverry match. You know, just typical Sunday stuff.

First things first: Getting the Data
So, the first thing I did was hunt down some data. I wasn’t trying to build a sophisticated model or anything, just wanted to poke around. I ended up finding some match stats online, nothing fancy. I copied and pasted the stuff into a CSV file. Pretty basic, I know.
Cleaning Up the Mess
Man, you wouldn’t believe the state of this data! It was all over the place. Dates in different formats, weird player names, just a total headache. I started by loading it into pandas in Python – my go-to for this kind of thing. Then the real fun began…
- Fixing Dates: Some dates were like ‘2024.05.26’ and others were ‘May 26, 2024’. I used
*_datetime
to make them all consistent. - Standardizing Names: The names were inconsistent too, like “Gael Monfils” vs. “Monfils, G.”. A bit of string manipulation fixed that.
- Handling Missing Values: There were a few missing values which i filled with ‘0’ just to keep moving and not get bogged down.
Honestly, cleaning the data took way longer than I thought it would. But hey, that’s data science, right?
Diving Into the Match
Once the data was (mostly) clean, I started to look at the match itself. I wanted to see a simple side-by-side. I grouped by Player, and then summed up the stats I was interested in – Aces, Double Faults, First Serve Percentage, Winners, and Unforced Errors.
I quickly visualized the results using matplotlib, nothing too fancy just some bar charts to quickly digest what I had in front of me.

What Did I Learn?
I mean, it was just a quick dive. But I got to quickly see how Monfils had quite a few more unforced errors that really cost him, Etcheverry was more consistent in his first serves.
Final Thoughts
It was a fun little project. Nothing groundbreaking, but it’s always good to practice. Plus, I learned a bit more about the importance of clean data. Until next time, happy coding!