An Only One Step Ahead Guide for Machine Learning Projects
By Chang Hsin Lee

There is a lot of hype around machine learning, but completing a project at work seems to be much harder than what online tutorials had advertised. In this talk, I will provide a few tips for different stages of a machine learning project like recognizing pitfalls, that I wished I knew when first navigating through my projects as a junior in the field.

Saturday 4:30 p.m.–5 p.m. in Cartoon 1


What does a data scientist’s day look like? On the one hand, people say that a data scientist's day is 5% modeling and 95% cleaning data and other stuff. On the other hand, there are many more machine learning tutorials and blog posts on modeling than posts on the "other stuff" when I search online. There seems to be a lack of guidance for junior data scientists when they enter into the field who are trying to complete their first few projects.

In the last few years, I have worked on several data science projects like this, where the path to success is unclear and the journey is full of pitfalls. In this talk, I will provide practical tips on machine learning projects that I learned the hard way. I will give you 2-3 tips with examples in each stage of a machine learning project --- before, during, and after --- that will help junior data scientists or anyone working on a machine learning project navigate through the muddy data waters better.


There are a few stages of a machine learning (ML) project, and I will give a few tips for each.

Before ML (7-8 minutes)

What kind of questions should I ask to the most out of the preparation stage of a machine learning project?

Starting ML (8-10 minutes)

How do I define success and how do I get there? What kind of model should I pick? What are some Python tools that can help me work through a project?

Pitfalls (8-10 minutes)

I will share examples and stories of pitalls I saw or fell into in my past projects that I wasn't aware of at the time.

Chang Hsin Lee

Chang is a data scientist at Lowe's Home Improvement, where he had worked on data science projects on fraud detection and supply chain. He started his journey in machine learning recently as a baseball research intern for the Tampa Bay Rays. Chang likes to write blog posts about his thoughts on data, and is a co-organizer of the Davidson Machine Learning meetup in Charlotte, NC. Besides math and coding, Chang is a soul food advocate and survives on barbecue.