Journal 45

 June 10 - June 16

I can't believe we are at the end of this course! These last 8 weeks have flown by because I was genuinely interested in the topics and enjoyed learning everything. This week was especially rewarding because it felt like the culmination of everything we have learned throughout the course so far, particularly with our lab homework being a sort of free-for-all coding assignment. I enjoyed the responsibility of designing and implementing my own machine learning workflow from start to finish.

One of the most important things I learned this week was that building a machine learning model involves much more than just selecting an algorithm and fitting it to the data. I had to think carefully about each step of the process. I explored the data to understand the target variable and investigate relationships between predictors and churn. Then, I encoded categorical variables, split the data into training and test sets, scaled the predictor variables appropriately, tuned hyperparameters using cross-validation, and evaluated model performance using various metrics. This helped me realize that the success of a machine learning project often depends just as much on the preprocessing and evaluation stages as it does on the choice of algorithm itself.

I also gained a deeper understanding of the strengths and weaknesses of different classification methods. I used both KNN and logistic regression to predict customer churn. KNN required scaling and hyperparameter tuning because its predictions depend on distances between observations. Logistic regression was simpler to train and provided a useful baseline model for comparison. I found it interesting that the two models ultimately produced very similar results, which reinforced the idea that there is not always a single best algorithm, but many similar correct routes. It was very interesting and informative to evaluate multiple approaches and determine which model best fits the goals of the problem.

One challenge I faced this week involved deciding how to preprocess the data and justify those decisions. Since there were no instructions, I had to rely on the concepts and best practices we had covered in class. Questions such as whether certain variables should be dummy encoded, when scaling should occur, and which evaluation metrics should be emphasized required careful thought and planning. Although this uncertainty was initially uncomfortable, it ultimately helped me become more confident in making data-driven decisions independently, and helped to make me a stronger data scientist.

Another area that required critical thinking was interpreting the results of the models. Our KNN model achieved slightly higher accuracy and recall than logistic regression, but the differences between the models were relatively small. This led me to think more deeply about what metric should matter most in a real-world prediction problem. If the goal is to identify as many customers at risk of leaving as possible, recall may be especially important. This experience showed me that evaluating machine learning models is not just about obtaining the highest score but understanding what those scores mean within the context of the business problem, and I am more than certain I will face problems and questions such as this in the future.

I felt as though this week's work directly relates to my future career aspirations. In baseball organizations, data scientists are often asked to make decisions under uncertainty using incomplete information. The same principles I applied in this homework such as careful preprocessing, thoughtful model selection, cross-validation, hyperparameter tuning, and interpreting evaluation metrics are essential skills for baseball analytics professionals. Just as a telecommunications company wants to predict customer churn, MLB teams want to predict outcomes that can improve roster construction, player development, and long-term organizational success. Even beyond the player aspect of the major leagues, I imagine front offices are evaluating similar data and models for 'churn' of fans in order to maximize viewership and attendance.

This week's assignment definitely challenged me to think more independently than previous assignments and reinforced the importance of applying sound judgment throughout the machine learning process. It was satisfying to see how the concepts from the first seven weeks of the course came together into a complete machine learning project. I feel much more confident in my ability to approach open-ended data science problems and make informed decisions about how to solve them. I am excited to take all that I have learned in this course and apply it to my future education and career aspirations. Thank you for a fun and informative course, it was among one of my favorites in the entire program.

Comments

Popular Posts