Skip to main content

Posts

Featured

Journal 43

  May 27 - June 2 I have been very much looking forward to this week's machine learning topic. We focused on data preprocessing, KNN classification, train/test splits, cross-validation, and methods for evaluating machine learning models. One of the biggest lessons I learned was that data cleaning is one of the most important steps in a data science project. Before this week, I tended to think of machine learning as mostly being about building models, but the lectures and homework showed that poor quality data can cause problems way before a model is trained. I learned how to identify missing values using functions such as isna(), how to count missing values by row and column, and how to decide whether missing data should be removed or imputed. I also learned that missing data can sometimes be disguised as unusual values such as zeros or special strings, which made the campaign contribution and diabetes homeworks particularly interesting and interactive. One topic I found challengin...

Latest Posts

Journal 42

Journal 41

Journal 40

Journal 39

Journal 38

Journal 37

Journal 36

Journal 35

Journal 34

Journal 33