“And it was fun looking at the comments, because you’d see things like ‘well, I’m throwing in this naive Bayes now, but I’m gonna come back and fix it it up and come up with something better later.’ And the comment would be from 2006. [laughter] And I think what that says is, when you have enough data, sometimes, you don’t have to be too clever about coming up with the best algorithm.”
I think about this insight more times than I’d like to admit, in those frequent situations where you end up spending more time on a clever, graceful solution because you look down on the banal work of finding and gathering data (or, in the classical pre-computer world, fact-finding and research).
But I also think about it in the context of people who are clever, but don’t have enough data to justify a “big data” solution. There’s an unfortunate tendency among these non-tech-savvy types to think that, once someone tells them how to use a magical computer program, they’ll be able to finish their work.
The flaw here is, well, if you don’t have enough data (i.e. more than a few thousand data-points or observations), then no computer program will help you find any worthwhile insight. But what’s more of a tragedy is that, since the datasets involved here are small, these clever people could’ve done their work just fine without waiting for the computerized solution.
So yes, having lots of data can make up for a lack of cleverness, because computers are great at data processing. But if you’re in the opposite situation – a clever person with not a lot of data – don’t overlook your cleverness.