D
26

Realized I was feeding my AI models garbage data for 6 months

I was running a small side project building a chatbot for local restaurant recommendations in Austin. For like 6 months I kept throwing in scraped Yelp reviews and menu text thinking more data was better. The bot kept giving people weird suggestions like 'try the fried chicken at a pizza place' and I could not figure out why. Then a buddy who works in data science glanced at my training set and pointed out half of it was from 2018 and full of closed restaurants and outdated menus. He just laughed and said 'your model is learning the past, not the present.' I felt so dumb lol. Now I manually date and filter my sources and the bot actually works half decent. Has anyone else had a moment where you realized you were overloading your model with junk and it ruined everything?
2 comments

Log in to join the discussion

Log In
2 Comments
lee_bailey65
Oh man that's rough... I get where you're coming from but honestly I see it a little different. Training on old data isn't always junk, it's just incomplete. You still learned something valuable from that mistake and your bot works now. If anything it shows how easy it is to skip the boring parts like checking dates and just dump everything in. My own little projects have taught me that a small clean set beats a giant mess every time.
2
ramirez.sage
Hear me out though, doesn't that kinda depend on what you're training for? If your bot is supposed to give stock advice or news summaries, training on data from 2019 is worse than useless, it's actually misleading. A small clean set beats a giant mess, sure, but a small clean set of old data is still just a snapshot of the past. I've seen people waste weeks trying to fix a model that kept recommending Blackberry stock because all their training data was from 2008.
1