Monthly Archives: February 2011

NICAR 2011 wrapup

Just came back from an inspiring week at the National Institute for Computer-Assisted Reporting in Raleigh, NC. Of all the journalism conferences I’ve been to, this one had the most to learn from and the most attendees excited to learn. There was real discussion about news apps being its own form of story-telling and art and not just uploading a bunch of numbers as HTML.

Chrys Wu has a compilation of the tipsheets and the highly technical tutorials. It’s a great trove for anyone – journalists or not – wanting to learn how to collect and process data and build powerful news applications. Some of my favorites, for their step-by-step nature: Jacob Fenton’s R tutorial, David Huynh’s detailed guide on his Google Refine, Andy Boyle’s on setting up Varnish, and Timothy Barmann’s walkthrough of Javascript mapping. My colleague Jeff Larson shows off his own Javascript skills with this MVC framework.

I led a couple of sessions. One boiled down to basically, use Firebug, which you can pretty much glean from a tutorial I wrote for ProPublica on how I grabbed the data from drugmaker Cephalon’s Flash site. I wrote another Ruby tutorial, starting from “Hello World” to building a Foursquare/Google Maps mashup…that was almost doable in an hour-session had I been better prepared with presentation materials.

One reason to try learning how to code now is that the number of teaching resources has never been more abundant. The NICAR resources collected on Chrys’s blog is more proof of this.

The free list of free New York museums:

Last Wednesday, in my haste to get it over with before I forgot about it after a weekend at NICAR, I threw up a hand-compiled chart of New York museums and other cultural attractions, focused primarily on when they were open and free. This was in response to a NY reddit user who asked just the right question to hit my “hey-maybe-*I*-can-do-something” buttons:

Does something like this exist? A chart? It seems like every museum has a day or two that it isn’t open and then one day that it’s open late (ideal for me) but they’re all different. Today, for example, I thought “I’d like to go to a museum but it’s going to be 5 soon and I have no idea if any are open late.” If somebody has an idea how this could be most logically put together, I wouldn’t mind doing it. I just can’t even imagine what form this would take other than some dry list or spreadsheet.

Well, I’m not much of a designer but I like making stuff that uses simple color bars and graphics to represent data, ever since my boss made me attend a Edward Tufte lecture. I also am a big fan of the special nights that museums have; a friend took me to the MOMA on one of the Target Free Fridays and I became a member afterward; I can’t count the times I’ve been since or the number of friends I’ve brought in, at the $5 member discount rate. Considering my tendency to sit around at home, I may have never gone without that first free night.

I got interview requests from writers at the Village Voice and the WSJ the day the map went up, so hopefully this chart gets out to the people who need one more reminder to check out all that’s great in this city.

The site’s a pretty lame technical feat; I looked at list of museums from Wikipedia and Yelp and then hit up each website to fill out a spreadsheet, which I converted to a webpage that’s way too big of a file for being mostly simple HTML. I guess I could’ve run a scraper on each site, but I wanted to acquaint myself with each place so I could get inspired to check out some new places. The info-gathering was by far the most painful and time-consuming aspect of this (my humble explanation for why it would take 7 days to make a sloppy HTML page with a Google map on top). It reminded me of the many restaurants that make you click through bouncy Flash graphics just to find their business hours. In defense of the museums though, their site-design M.O. is probably to wow people enough with images so that they won’t mind digging through to find the pertinent visitor and admission info. Still, it’s kind of annoying for those of us who just want to get down to some art-seeing business.

Now that I’ve got the basic info down, along with a lot of the museums’ social media links, the next step will be to…well, make this a real site from a framework rather than a Ruby script that reads from a Google spreadsheet. Then, to make a newsfeed of exhibits and events and put everything in a standard hcard format. I’ll probably tackify the site up with photos I’ve taken, too. As someone who needs Google to find what direction I’m walking in, I’m always kind of reluctant to do what the Great Indexers, including Wikipedia contributors, have already done. But then again, those broad informational frameworks don’t always show you enough specific details up front (such as the existence of free hours) to encourage you to go beyond the first search results. And since working on the Dollars for Docs project, I’ve learned there’s always a way to make already-easily available information much more useful.

Check out here.

dataist blog: An inspiring case for journalists learning to code

About a year ago I threw up a long, rambling guide hoping to teach non-programming journalists some practical code. Looking back at it, it seems inadequate. Actually, I misspoke, I haven’t looked back at it because I’m sure I’ll just spend the next few hours cringing. For example, what a dumb idea it was to put everything from “What is HTML” to actual Ruby scraping code all in a gigantic, badly formatted post.

The series of articles have gotten a fair number of hits but I don’t know how many people were able to stumble through it. Though last week I noticed this recent trackback from dataist, a new “blog about data exploration” by Finnish journo Jens Finnäs. He writes that he has “almost no prior programming experience” but, after going through my tutorials and checking out Scraperwiki, was able to produce this cool network graph of the Ratata blog network after about “two days of trial and error”:

Mapping of Ratata blogging network by Jens Finnäs of

Mapping of Ratata blogging network by Jens Finnäs of

I hope other non-coders who are still intimidated by the thought of learning programming are inspired by Finnas’s example. Becoming good at coding is not a trivial task. But even the first steps of it can teach a non-coder some profound lessons about data important enough on their own. And if you’re a curious-type with a question you want to answer, you’ll soon figure out a way to put something together, as in Finnas’s case.

ProPublica’s Dollars for Docs project originated in part from this Pfizer-scraping lesson I added on to my programming tutorial: I needed a timely example of public data that wasn’t as useful as it should be.

My colleagues Charles Ornstein and Tracy Weber may not be programmers (yet), but they are experienced enough with data to know its worth as an investigative resource, and turned an exercise in transparency into a focused and effective investigation. It’s not trivial to find a story in data. Besides being able to do Access queries themselves, C&T knew both the limitations of the data (for example, it’s difficult to make comparisons between the companies because of different reporting periods) and its possibilities, such as the cross-checking of names en masse from the payment lists with state and federal doctor databases.

Their investigation into the poor regulation of California nurses – a collaboration with the LA Times that was a Pulitzer finalist in the Public Service category – was similarly data-oriented. They (and the LA Times’ Maloy Moore and Doug Smith) had been diligently building a database of thousands of nurses – including their disciplinary records and the time it took for the nursing board to act – which made my part in building a site to graphically represent the data extremely simple.

The point of all this is: don’t put off your personal data-training because you think it requires a computer science degree, or that you have to become great at it in order for it to be useful. Even if after a week of learning, you can barely put together a programming script to alphabetize your tweets, you’ll likely gain enough insight to how data is made structured and useful, which will aid in just about every other aspect of your reporting repertoire.

In fact, just knowing to avoid taking notes like this:

Colonel Mustard used the revolver in the library? (not library)
Miss Scarlet used the Candlestick in the dining room? (not Scarlet)
“Mrs. Peacock, in the dining room, with the revolver? “
“Colonel Mustard, rope, conservatory?”
Mustard? Dining room? Rope (nope)?
“Was it Mrs. Peacock with the candlestick, inside the dining room?”

And instead, recording them like this:

Who/What? Role? Ruled out?
Mustard Suspect N
Scarlet Suspect Y
Peacock Suspect N
Revolver Weapon Y
Candlestick Weapon Y
Rope Weapon Y
Conservatory Place Y
Dining Room Place N
Library Place Y

…will make you a significantly more effective reporter, as well as position you to have your reporting and research become much more ready for thorough analysis and online projects.

There’s a motherlode of programming resources available through single Google search. My high school journalism teacher told us that if you want to do journalism, don’t major in it, just do it. I think the same can be said for programming. I’m glad I chose a computer field as an undergraduate so that I’m familiar with the theory. But if you have a career in reporting or research, you have real-world data-needs that most undergrads don’t. I’ve found that having those goals and needing to accomplish them has pushed my coding expertise far quicker than did any coursework.

If you aren’t set on learning to program, but want to get a better grasp of data, I recommend learning:

  • Regular expressions – a set of character patterns, easily printable on a cheat-sheet for memorization, that you use in a text-editor’s Find and Replace dialog to turn a chunk of text into something you can put into a spreadsheet, as well as clean up the data entries themselves. is the most complete resource I’ve found. A cheat-sheet can be found here. Wikipedia has a list of some simple use cases.
  • Google Refine – A spreadsheet-like program that makes easy the task of cleaning and normalizing messy data. Ever go through campaign contribution records and wish you could easily group together and count as one, all the variations of “Jon J. Doe”, “Jonathan J. Doe”, “Jon Johnson Doe”, “JON J DOE”, etc.? Refine will do that. Refine developer David Huynh has an excellent screencast demonstrating Refine’s power. I wrote a guide as part of the Dollars for Docs tutorials. Even if you know Excel like a pro – which I do not – Refine may make your data-life much more enjoyable.

If you want to learn coding from the ground up, here’s a short list of places to start:

NYFW11: Moncler @ Grand Central Station Flash Mob (New York Fashion Week)

Usually it’s pretty easy to get to your train at Grand Central, unless someone decides to hold a fashion event in the main terminal. I was lucky enough to have been in the front when this fashion event’s organizers started making room for the 150+ dancers, I put a bunch of photos in this Flickr set.

Moncler @ Grand Central Station, New York Fashion Week 2011

Moncler NYFW Flashmob at Grand Central, NYC

NYFW: Boogie Woogie in Grand Central

NYFW: Paparazzi at the Moncler Grand Central station show

Finish to Moncler show at Grand Central

The New York Times wrote about how difficult it was to put together a show in a landmark like Grand Central post-9/11:

“The city was very specific about not mentioning flash mob,” Mr. Coppers said. Still, a flash mob is what it looked like at 7:25 to the unsuspecting travelers scanning the announcement board for their track numbers and reading about ice conditions on the Hudson shutting down the Haverstraw-Ossining Ferry.

They suddenly found themselves infiltrated by a large and highly coordinated group of what appeared to be chic aliens, appearing out of nowhere to take over the terminal. There were 363 of them, 163 wearing goggles and vividly colored ski clothes and another 200 hired to pass as ordinary travelers.

At a signal from Etienne Russo, the Belgian mastermind of the Moncler Grenoble event (and the man who once had a Swedish iceberg cut into pieces and shipped to the Grand Palais in Paris for a Chanel show), the extras began clearing the concourse for what was surely the most ambitious and spectacular event of Fashion Week and the only one impossible to transplant to any other place.

“For months I thought it was not doable, but I was obsessed,” Mr. Russo had said. Six hours before the show began, he was pacing around a rehearsal studio in a warehouse set by the East River in the outer reaches of Brooklyn, as the choreographer Luam put her dancers — some trained but many not — through their paces.

“We wanted to do something in Times Square, but because of what happened, that’s impossible,” Mr. Russo added, referring to the attempted car bombing. “But as soon as we came to Grand Central, I said it has to be here.”

Some video:

More photos in this Flickr set.

Egypt’s Mubarak resignation on Twitscoop at 11:58 A.M. Feb. 11, 2011…almost a clean sweep

Egypt’s Mubarak steps down: the first time that I’ve seen Twitscoop dominated by a single topic (“Bush” refers to tweets telling GWB that Middle East change can happen without war)…except for the “annoyingorange” meme….Yeah, that really *is* annoying.

Is Solitary Confinement Torture? From Atul Gawande and the New Yorker

Punishment Cells

Punishment Cells. From: Page 257 of part II of Vlas Mikhailovich Doroshevich «Sakhalin (Katorga)», Moscow. Sytin publisher, 1905.

Thanks to for spotlighting another thought-provoking piece by Dr. Atul Gawande in the New Yorker. The tag line is: Hellhole: The United States holds tens of thousands of inmates in long-term solitary confinement. Is this torture?

Dr. Gawande’s reporting builds a strong case for “Yes.” Some interesting bullet points:

  • America holds at least 25,000 inmates in solitary confinement in Supermax prisons
  • More than a century ago, the U.S. Supreme Court considered banning solitary confinement
  • A 2003 analysis of Arizona, Illinois, and Minnesota found that levels of inmate-on-inmate violence were unchanged after their supermax prisons opened
  • The state of Maine has more inmates in long-term solitary than does all of England

Supermax prisons and the long-term isolation of large numbers of inmates, Dr. Gawande notes, is only a decades-old concept in the American prison system. However, in the 1890 SCOTUS case, Medley vs. U.S., the court takes note of a solitary confinement system in Philadelphia back in 1787. The conditions and consequences, noted more than two centuries ago, aren’t much different than what Dr. Gawande describes today:

The peculiarities of this system were the complete isolation of the prisoner from all human society, and his confinement in a cell of considerable size, so arranged that he had no direct intercourse with or sight of any human being and no employment or instruction….

A considerable number of the prisoners fell, after even a short confinement, into a semi-fatuous condition, from which it was next to impossible to arouse them, and others became violently insane; others still committed suicide, while those who stood the ordeal better were not generally reformed, and in most cases did not recover sufficient mental activity to be of any subsequent service to the community.

It became evident that some changes must be made in the system, and the separate system was originated by the Philadelphia Society for Ameliorating the Miseries of Public Prisons, founded in 1787.

Following the standard journalistic narrative, Dr. Gawande leads with his best anecdote and ends with his second-best. The entire piece is a must read, but the last anecdote is particularly astonishing. Gawande describes the case of Robert Felton, who spent 14 years of his 36 years on earth in solitary confinement. The isolation drove him crazy, Gawande writes, and Felton tried so many times to set his cell on fire with a lightbulb that “the walls of his cell were black with soot.”

Gawande writes about one of his last meetings with Felton. Felton had just found out the prison director who kept him in solitary confinement had just been convicted of bribery (from lobbyists, a sidestory that would probably illuminate why America holds on to certain prison strategies regardless of effect) and sentenced to two years in prison:

“Two years in prison,” Felton marvelled. “He could end up right where I used to be.”

I asked him, “If he wrote to you, asking if you would release him from solitary, what would you do?”

Felton didn’t hesitate for a second. “If he wrote to me to let him out, I’d let him out,” he said.

This surprised me. I expected anger, vindictiveness, a desire for retribution. “You’d let him out?” I said.

“I’d let him out,” he said, and he put his fork down to make the point. “I wouldn’t wish solitary confinement on anybody. Not even him.”

Read Dr. Gawande’s story in the New Yorker.