Good news for data-nerds everywhere. The 2.0 version of Google’s fantastic data-cleaning tool, Google Refine (formerly Gridworks), has been released. And they were nice enough to feature ProPublica’s Dollars for Docs as an example of a use-case. I talked briefly to BusinessJournalism.org about how I used Refine to put together the pharma top earners list.
It’s possible I could’ve done it using SQL queries and Ruby libraries. But I definitely would’ve missed a lot of matches, and probably overdosed on over-the-counter pharma-painkillers.