D
10

Update: I was feeding my model way too much data for a simple task

I was building a tool to sort support tickets for a small shop in Boise, and I kept adding more training examples thinking it would get smarter. After three weeks, my friend asked me why it needed 10,000 examples just to tell if a ticket was about shipping or a broken item. That question made me stop and check the results, and the model was basically just memorizing the examples instead of learning the simple rule. Has anyone else found that a smaller, cleaner dataset actually works better for these basic classification jobs?
2 comments

Log in to join the discussion

Log In
2 Comments
sarah198
sarah1981mo ago
Totally agree with your friend's question. That's the classic "more data is better" trap. Your model was just memorizing tickets instead of learning the simple pattern. For a basic job like sorting two things, a tiny set of clear examples is all you need. It forces the model to actually find the rule. I've seen this happen so many times.
3
clark.iris
clark.iris1mo ago
I read a blog post last year about a guy who trained a model to spot spam emails. He started with 50,000 messages and the thing was a mess, but when he cut it down to just 500 really clear examples it worked almost perfectly. It's easy to think more data is always better, but for a simple yes or no job, the model just gets lost in the noise. You probably only needed a few hundred good tickets to teach it the difference between shipping and broken items.
-1