Dummy Data, Drugs, and Check-lists

Using dummy data — and forgetting to remove it — is a pretty common and unfortunate occurrence in software development…and in journalism (check out this headline). If you haven’t made yourself a pre-publish/produce checklist that covers even the most basic things (“do a full-text search for George Carlin’s seven dirty words“), now’s a good time to start.

These catastrophic mistakes can happen even when billions of dollars are on the line.

In his book, New Drugs: An Insider’s Guide to the FDA’s New Drug Approval Process, author Lawrence Friedhoff says he’s seen this kind of thing happen “many” times in the the drug research phase. He describes an incident in which his company was awaiting the statistical review of two major studies that would determine if a drug could move into the approval phase:

The computer programs to do the analysis were all written. To be sure that the programs worked properly, the statisticians had tested the programs by making up treatment assignments for each patient without regard to what the patients had actually received, and verifying that the programs worked properly with these “dummy” treatment codes.

The statisticians told us it would take about half an hour to do the analysis of each study and that they would be done sequentially. We waited in a conference room. The air was electric. Tens of millions of dollars of investment hung in the balance. The treatment options of millions of patients would or would not be expanded. Perhaps, billions of dollars of revenue and the future reputation of the development team would be made or lost based on the results of the statistical analyses.

The minutes ticked by. About 20 minutes after we authorized the code break, the door opened and the statisticians walked in. I knew immediately by the looks on their faces that the news was good…One down, one to go (since both studies had to be successful to support marketing approval).

The statisticians left the room to analyze the second study…which could still theoretically fail just by chance, so nothing was guaranteed. Finally, after 45 minutes, the door swung open, and the statisticians walked in. I could tell by the looks on their faces that there was a problem. They started presenting the results of the second study, trying to put a good face on a devastatingly bad outcome. The drug looked a little better than control here but worse there… I couldn’t believe it. How could one study have worked so well and the other be a complete disaster? The people in the room later told me I looked so horrified that they thought I would just have a heart attack and die on the spot.

The positive results of the first study were very strong, making it exceedingly unlikely that they occurred because of a mistake, and there was no scientific reason why the two studies should have given such disparate results.

After about a minute, I decided it was not possible for the studies to be so inconsistent, and that the statisticians must have made a mistake with the analysis of the second study…Ultimately they said they would check again, but I knew by their tone of voice that they viewed me with pity, a clinical development person who just couldn’t accept the reality of his failure.

An eternity later, the statisticians re-entered the room with hangdog looks on their faces. They had used the “dummy” treatment randomization for the analysis of the second study. The one they used had been made up to test the analysis programs, and had nothing to do with the actual treatments the patients had received during the study.

From: Friedhoff, Lawrence T. (2009-06-04). New Drugs: An Insider’s Guide to the FDA’s New Drug Approval Process for Scientists, Investors and Patients (Kindle Locations 2112-2118). PSPG Publishing. Kindle Edition.

So basically, Friedhoff’s team did the equivalent of what a newspaper does when laying out the page before the articles have been written: put in some filler text to be replaced later. Except that the filler text doesn’t get replaced at publication time…again, see this Romenesko link to see the disastrous/hilarious results.

Here’s an example of it happening in the tech startup world.

What’s interesting about Friedhoff’s case, though, is that validation of study results is a relatively rare – and expensive occurrence…whereas publishing a newspaper happens every day, as does pushing out code and test emails. But Friedhoff says the described incident is “only one of many similar ones I could write about”…which goes to show that rarity and magnitude of a task won’t stop you from making easy-to-prevent, yet devastating mistakes.

Relevant: Atul Gawande’s article about the check-list: how a simple list of five steps, as basic as “Wash your hands”, prevented thousands of surgical disasters.

I'm a programmer journalist, currently teaching computational journalism at Stanford University. I'm trying to do my new blogging at blog.danwin.com.