Tag Archives: journalism

small data Journalism – practical lessons of data journalism

Next week, my short course on data journalism at New York University (SCPS) begins. It’s only five weeks long, but I wanted the material and readings I used for the class to be accessible for anyone. You can check it out at smalldatajournalism.com, a site I built using Jekyll and which will serve as the home for future musings and data projects.

“Better know a developer” at AAJA 2013

I had the privilege of being on a panel with the New York Times’s Chase Davis and former YouTube designer Hong Qu at this year’s Asian American Journalists Association convention

The panel was titled, “Better Know a Developer” and my part of it was to discuss how non-programming journalists can work best with programmers.

You can see the slides here. The advice boils down to: Don’t believe in magic. Think about how you would do it yourself. And use a spreadsheet.

Republia Times: possibly the best game about newspapering ever made

Update: I guess I’m not being completely hyperbolic; Mr. Pope’s “Republia Times” is nominated for “Most Significant Impact” and “Best Gameplay” awards at this year’s Games for Change Festival…not bad for a game he made in 48 hours as practice.

Ever wondered what it’s like to edit a newspaper and influence what the public thinks and cares about? The small, but financially stable Republia Times has an opening for editor-in-chief. The job duties are simply “increase [the public’s] loyalty by editing the Republia Times carefully. Pick only stories that highlight the good things about Republia and its government.”

“The Republia Times” was created by developer Lucas Pope and is as sharp as satire of newspapering as I’ve ever seen in the gaming world. Its crude mechanics and appearance may be off-putting, but as a whole, “The Republia Times” is astonishing considering that Pope wrote it to practice for a 48-hour game development competition. Not only that, but it was his first Flash game, which, if you’ve never tried learning the Flash development environment, is astonishing in itself.

I don’t think Pope has been a newspaper editor before, either, but he manages to capture the cynicism behind modern and classic yellow journalism: political articles bore the readership, weather and sports attract it. The twist here is that the Republia Times is the mouthpiece of the state, and so you have to balance the interesting tabloid material (“C&J Tie the Knot!”) with boilerplate to make the government look good (“Latest poll shows broad satisfaction with government leaders”). There’s a little mini-Tetris challenge in fitting the stories in (you choose how much real-estate each article gets) before the clock runs out, and an additional plot twist halfway through the game.

The game is probably too cynical for most journalists, at least the ones who don’t fancy themselves government spokespeople, but even the most idealistic of editors will get a kick out how Pope manages to distill the profession into something so simplistic. That Pope manages to make it entertaining and thought-provoking despite the limits he was working with a notable achievement. I can’t think of any news-related game that has been better executed, though, admittedly, the field is small. The Knight Foundation News Challenge has given hundreds of thousands of dollars in grants to journalism-themed games. If I were them, I’d give Pope six-figures to make something, even though it may be more subversive than the journalism industry would prefer.

I’ve actually buried the lede here. I only came across the Republia Times, which Pope created last year, because I read about his upcoming game, “Papers, Please!“, which puts you in the shoes of a border inspector in a Cold War-era nation. It’s only in playable beta (free for Mac and PC), but I wouldn’t be surprised if it’s my favorite game of the year. The trailer speaks for itself:

Pope says the game will hopefully be out this summer. If you’re on Steam, give Pope an upvote on Greenlight.

A tiny website wins 2013’s Pulitzer for National Reporting

I used to work with Susan White at ProPublica but even I was completely surprised yesterday when InsideClimate News, the non-profit news website she now leads, won the Pulitzer Prize for National Reporting for an in-depth investigation of a 2010 pipeline spill in Michigan.

Don’t remember that spill? Maybe that’s why InsideClimate titled its story, “the biggest oil spill you’ve never heard of.”

You might also describe InsideClimate News as “the online news startup you’ve never heard of” – I wouldn’t know anything about it if it hadn’t been where Susan moved to. The surprise isn’t that she led yet another Pulitzer Prize project (she edited two such projects already at the San Diego Union Tribune and ProPublica) – it’s that InsideClimate News just seemed too small, too novel of a news organization to earn the Pulitzer committee’s notice.

At just 5 years old and with only 7 full-time reporters, InsideClimate News is likely the smallest news organization ever to win in the National Reporting category (see table below), and perhaps the smallest news organization ever to win any Pulitzer since the Point Reyes Light in 1979.

Here’s another size measurement: According to the AP, InsideClimate had about 200,000 page views last month. The winner of last year’s National Reporting Pulitzer, the Huffington Post, is also an online-only news site. But it reportedly racks up a a billion page views a month: i.e., 5,000 times the page views at InsideClimate.

Numbers may seem like a superficial metric, but there’s a reason why big papers dominate every Pulitzer category (except for maybe Public Service) – big investigations require big resources. InsideClimate’s investigation occupied 3 of their reporters for 7 months, a major commitment for a news organization still struggling to draw a daily readership. Even more impressive: InsideClimate is based in Brooklyn, but they invested time and money (i.e. a travel budget) for a story several states away.

As InsideClimate reporter Elizabeth McGowan told the AP:

“That’s quite a sacrifice to make when you’re trying to get eyeballs on your website,” said McGowan, who started her reporting with a trip out to Marshall, Mich., in November 2011. “We made the commitment to this story because we thought this story mattered.”

“Pulling me off, their most seasoned reporter, was an act of faith to some degree because I could’ve been pounding out five, six, seven stories a week”

I didn’t read InsideClimate’s project when it came out and the comment/social-media sections on the early stories didn’t show huge pickup initially. The presentation is what’d you’d expect from a small no-frills operation: nearly all the photos come from government sources and the graphics are relatively straightforward and non-interactive. But thankfully, the stories were judged by the quality and impact of their investigation, rather than fanciness of presentation.

A screenshot of the first story in InsideClimate's series

A screenshot of the first story in InsideClimate’s series

The future of journalism as a profession, never mind investigative news, is still uncertain. But InsideClimate’s Pulitzer is a great validation of how passionate startups can still make a huge impact in the proud tradition of watchdog journalism. Congrats to InsideClimate and its lead reporters, Lisa Song, Elizabeth McGowan and David Hasemyer.

You can read the entire series on the Pulitzer’s official website. Or you can download the story in ebook format here.

An aggregated list of National Reporting Pulitzers

The list below is scraped from the Pulitzer’s official list, and I used OpenRefine to cluster the names together. Interestingly, the last three National Reporting Pulitzers have been won by online-only organizations: InsideClimate News, Huffington Post, and ProPublica. In 2009, the St. Petersburg Times won a National Reporting Pulitzer for its PolitiFact project. PolitiFact had a print component but it can be reasonably seen as the first Pulitzer-winning website.

Fifteen years ago, there was debate over whether the Pulitzer committee should have a separate prize for online-only submissions. The committee has wisely decided to judge journalism by its quality and not what format it comes in, and the success of news websites in this prestigious category is a good sign of how forward-thinking the Pulitzers have become.

Name National Reporting Pulitzers
New York Times 17
Wall Street Journal 14
Philadelphia Inquirer 13
Washington Post 13
Des Moines Register and Tribune 7
Los Angeles Times 7
Associated Press 5
Chicago Tribune 5
Boston Globe 5
United Press International 3
St. Petersburg Times 3
Dallas Times Herald 2
Dayton Daily News 2
Christian Science Monitor 2
Oregonian 2
Seattle Times 2
Washington Star 2
Minneapolis Tribune 2
Albuquerque Tribune 1
Bloomberg News 1
Chattanooga Times 1
Chicago Daily News 1
Gannett News Service 1
InsideClimate News 1
Kansas City Star 1
Knight Newspapers 1
Knight-Ridder, Inc. 1
Miami Herald 1
Nashville Tennessean 1
New York Daily News 1
New York Herald Tribune 1
Newhouse News Service 1
Pittsburgh Post-Gazette 1
ProPublica 1
Providence Journal and Evening Bulletin 1
Scripps-Howard Newspaper Alliance 1
Arizona Republic 1
Atlanta Journal and Constitution 1
Baltimore Sun 1
Boston Phoenix 1
Dallas Morning News 1
Huffington Post 1
Kansas City Times 1
Miami (FL) News 1
Times-Picayune 1
Washington Daily News 1

John Sullivan, 48. A notable non-notable obituary in the New York Times

John Sullivan, 48, profiled in the New York Times

John Sullivan, 48, profiled in the New York Times

One of my favorite assignments as a metro newspaper reporter was the occasional obituary. Not so much the ones about people whose lives (or deaths) were notable in a news sense (such as a local prominent politician, or a murder victim) and necessitated a timely story. My favorite obits were about people who came and went with no mass announcement and were, at least in the past decade, mostly-unknown-in-life, but were now selected by the obit editor out of the hundreds of other local recently-passed, non-notable-in-names.

But “non-notable” only in that they their name wasn’t immediately connected to any famous event or accomplishment that most readers remember or had ever heard about. Because even with just a half-day’s worth of interviews to learn about a late, complete stranger, you could find out at least one notable accomplishment from his/her surviving relatives, as well as details of personal drama universal to us all, and distill his/her life into a profile as interesting and inspiring as the celebrity obits that shared space in the next-day’s section.

I hadn’t read many of non-celeb obits since moving to NYC. But while waiting for take-out, I checked the Times on my phone and came across this obit about a young well-off-salesman-turned-social-worker:

After she had unpacked, and her toothbrush was on the sink, the woman realized something was missing. She turned to John Sullivan, the tall, smiling social worker who had discovered her on a bench in the Broadway median. The woman was a nurse who had lost her grip and had been living in a tent on the Upper West Side, until Mr. Sullivan coaxed her off the street. She was delighted to be in an apartment of her own.

“Just one thing,” she told him. “I really need a tent for here.”

Mr. Sullivan left. He came back with a tent, which she pitched in the living room. Some time and medication later, she put it away.

In Mr. Sullivan’s line of work, there was no instruction manual.

Mr. Sullivan grew up in Sleepy Hollow, N.Y., a star high school quarterback and pitcher who took his golden personality and looks into sales. He made a fine living that provided him, as he once said, “lots of travel and a closet full of Brooks Brothers clothes.” He also drank too much. Then he stopped.

One morning, on his way to a run around the reservoir in Central Park, he passed homeless people in the street. The next day, he applied to Fordham University to begin graduate school in social work. In 1995, he got a job with Pathways to Housing, an agency that finds homes and help for people with mental illness and addiction living on the street. He prowled East Harlem before it was gentrified, meeting people living under railroad tracks and in abandoned buildings.

Read the rest of the obit here: In Helping Others, Finding What Was Never Truly Lost, by Jim Dwyer

NICAR 2011 wrapup

Just came back from an inspiring week at the National Institute for Computer-Assisted Reporting in Raleigh, NC. Of all the journalism conferences I’ve been to, this one had the most to learn from and the most attendees excited to learn. There was real discussion about news apps being its own form of story-telling and art and not just uploading a bunch of numbers as HTML.

Chrys Wu has a compilation of the tipsheets and the highly technical tutorials. It’s a great trove for anyone – journalists or not – wanting to learn how to collect and process data and build powerful news applications. Some of my favorites, for their step-by-step nature: Jacob Fenton’s R tutorial, David Huynh’s detailed guide on his Google Refine, Andy Boyle’s on setting up Varnish, and Timothy Barmann’s walkthrough of Javascript mapping. My colleague Jeff Larson shows off his own Javascript skills with this MVC framework.

I led a couple of sessions. One boiled down to basically, use Firebug, which you can pretty much glean from a tutorial I wrote for ProPublica on how I grabbed the data from drugmaker Cephalon’s Flash site. I wrote another Ruby tutorial, starting from “Hello World” to building a Foursquare/Google Maps mashup…that was almost doable in an hour-session had I been better prepared with presentation materials.

One reason to try learning how to code now is that the number of teaching resources has never been more abundant. The NICAR resources collected on Chrys’s blog is more proof of this.

dataist blog: An inspiring case for journalists learning to code

About a year ago I threw up a long, rambling guide hoping to teach non-programming journalists some practical code. Looking back at it, it seems inadequate. Actually, I misspoke, I haven’t looked back at it because I’m sure I’ll just spend the next few hours cringing. For example, what a dumb idea it was to put everything from “What is HTML” to actual Ruby scraping code all in a gigantic, badly formatted post.

The series of articles have gotten a fair number of hits but I don’t know how many people were able to stumble through it. Though last week I noticed this recent trackback from dataist, a new “blog about data exploration” by Finnish journo Jens Finnäs. He writes that he has “almost no prior programming experience” but, after going through my tutorials and checking out Scraperwiki, was able to produce this cool network graph of the Ratata blog network after about “two days of trial and error”:

Mapping of Ratata blogging network by Jens Finnäs of dataist.wordpress.com

Mapping of Ratata blogging network by Jens Finnäs of dataist.wordpress.com

I hope other non-coders who are still intimidated by the thought of learning programming are inspired by Finnas’s example. Becoming good at coding is not a trivial task. But even the first steps of it can teach a non-coder some profound lessons about data important enough on their own. And if you’re a curious-type with a question you want to answer, you’ll soon figure out a way to put something together, as in Finnas’s case.

ProPublica’s Dollars for Docs project originated in part from this Pfizer-scraping lesson I added on to my programming tutorial: I needed a timely example of public data that wasn’t as useful as it should be.

My colleagues Charles Ornstein and Tracy Weber may not be programmers (yet), but they are experienced enough with data to know its worth as an investigative resource, and turned an exercise in transparency into a focused and effective investigation. It’s not trivial to find a story in data. Besides being able to do Access queries themselves, C&T knew both the limitations of the data (for example, it’s difficult to make comparisons between the companies because of different reporting periods) and its possibilities, such as the cross-checking of names en masse from the payment lists with state and federal doctor databases.

Their investigation into the poor regulation of California nurses – a collaboration with the LA Times that was a Pulitzer finalist in the Public Service category – was similarly data-oriented. They (and the LA Times’ Maloy Moore and Doug Smith) had been diligently building a database of thousands of nurses – including their disciplinary records and the time it took for the nursing board to act – which made my part in building a site to graphically represent the data extremely simple.

The point of all this is: don’t put off your personal data-training because you think it requires a computer science degree, or that you have to become great at it in order for it to be useful. Even if after a week of learning, you can barely put together a programming script to alphabetize your tweets, you’ll likely gain enough insight to how data is made structured and useful, which will aid in just about every other aspect of your reporting repertoire.

In fact, just knowing to avoid taking notes like this:

Colonel Mustard used the revolver in the library? (not library)
Miss Scarlet used the Candlestick in the dining room? (not Scarlet)
“Mrs. Peacock, in the dining room, with the revolver? “
“Colonel Mustard, rope, conservatory?”
Mustard? Dining room? Rope (nope)?
“Was it Mrs. Peacock with the candlestick, inside the dining room?”

And instead, recording them like this:

Who/What? Role? Ruled out?
Mustard Suspect N
Scarlet Suspect Y
Peacock Suspect N
Revolver Weapon Y
Candlestick Weapon Y
Rope Weapon Y
Conservatory Place Y
Dining Room Place N
Library Place Y

…will make you a significantly more effective reporter, as well as position you to have your reporting and research become much more ready for thorough analysis and online projects.

There’s a motherlode of programming resources available through single Google search. My high school journalism teacher told us that if you want to do journalism, don’t major in it, just do it. I think the same can be said for programming. I’m glad I chose a computer field as an undergraduate so that I’m familiar with the theory. But if you have a career in reporting or research, you have real-world data-needs that most undergrads don’t. I’ve found that having those goals and needing to accomplish them has pushed my coding expertise far quicker than did any coursework.

If you aren’t set on learning to program, but want to get a better grasp of data, I recommend learning:

  • Regular expressions – a set of character patterns, easily printable on a cheat-sheet for memorization, that you use in a text-editor’s Find and Replace dialog to turn a chunk of text into something you can put into a spreadsheet, as well as clean up the data entries themselves. Regular-expressions.info is the most complete resource I’ve found. A cheat-sheet can be found here. Wikipedia has a list of some simple use cases.
  • Google Refine – A spreadsheet-like program that makes easy the task of cleaning and normalizing messy data. Ever go through campaign contribution records and wish you could easily group together and count as one, all the variations of “Jon J. Doe”, “Jonathan J. Doe”, “Jon Johnson Doe”, “JON J DOE”, etc.? Refine will do that. Refine developer David Huynh has an excellent screencast demonstrating Refine’s power. I wrote a guide as part of the Dollars for Docs tutorials. Even if you know Excel like a pro – which I do not – Refine may make your data-life much more enjoyable.

If you want to learn coding from the ground up, here’s a short list of places to start:

“We are detectives for the people” – Village Voice’s Wayne Barrett’s Final Column

Legendary reporter Wayne Barrett filed his last column for the Village Voice this week. It reads like it’s from someone who has muckraked for nearly 40 years and has had a lot of time to think about his job:

When I was asked in recent years to blog frequently, I wouldn’t do it unless I had something new to tell a reader, not just a clever regurgitation of someone else’s reporting.

My credo has always been that the only reason readers come back to you again and again over decades is because of what you unearth for them, and that the joy of our profession is discovery, not dissertation.

There is also no other job where you get paid to tell the truth. Other professionals do sometimes tell the truth, but it’s ancillary to what they do, not the purpose of their job. I was asked years ago to address the elementary school that my son attended and tell them what a reporter did and I went to the auditorium in a trenchcoat with the collar up and a notebook in a my pocket, baring it to announce that “we are detectives for the people.”

…It never mattered to me what the party or ideology was of the subject of an investigative piece; the reporting was as nonpartisan as the wrongdoing itself. I never looked past the wrist of any hand in the public till. It was the grabbing that bothered me, and there was no Democratic or Republican way to pick up the loot.

Pultizer Prize at ProPublica

Pulitzer Prize

It’s been a huge last few days for ProPublica. My colleagues Jesse Eisinger and Jake Bernstein unveiled the result of 7+ months of reporting, a much anticipated collaboration with “This American Life” on how the hedge fund Magnetar Capital helped prolong the housing bubble by betting against risky investments that it advocated for. Also, our story on private jet owners hiding in public airspace, uncovered by Michael Grabell (after our lawyers’ successful litigation), was one of our most viewed, thanks to it getting top play by USA Today and Yahoo.

Those both alone would’ve made it one of ProPublica’s most prominent weeks, but then Sheri Fink won the Pulitzer for Investigative Reporting for her massive investigation, published in the NYT magazine, on how a hospital’s doctors, post-Katrina, reportedly put patients to death under the guise of mercy and grace under chaos. Sheri’s win is extremely gratifying, because her subject had a lot of things going against it: Katrina was a four-year-old painful, chaotic memory that most Americans wanted to forget. And for N.O. residents, it seemed that the overwhelming sentiment was for the doctors and other authorities who did what they could. Anna Pou, the doctor at the center of Sheri’s story, had been exonerated (and the prosecutor who went after her was removed). And after Sheri’s story, no new charges have been made against her.

The story itself is a long-read. In addition to the factors above going against it, it also doesn’t deliver an immediate payoff for the ADD-afflicted reader. It’s only until the end that you can appreciate the light that Sheri shed on a universally important, yet opaque topic: who deserves life in a time of crisis? I think Sheri’s story, and subsequent follow-ups related to swine flu preparations, raised the alarm that not even our medical professionals are on the same page, and moved the ball in such a way that her findings would shock even the most cynical skeptics of the medical profession.

Also, congrats to my colleagues Charles Ornstein and Tracy Weber for being finalists in the Public Service category for their exposure of California’s broken nursing board. For them to even be considered for that prize, considering they won it recently before in the same area (lax oversight of medical care) is a testament to how thorough their work was again, and how much impact their stories had (Gov. Schwarzenegger immediately sacked or forced out a majority of the board afterwards).

I think our office felt confident our work was as good as any Pulitzer contender and it wouldn’t be a shock to win, even though we would be the first online-only organization (and possibly the youngest, at two years old) to do it. The drama was less about whether if we would win but which one of our reporters would win. For example, T. Christian Miller and his work on defense contractors was, in my mind, as deserving as any. Like Sheri, he shed light, in an exhaustive, dogged fashion, on a subject that most people would rather not care about: the treatment of civilians who are injured in warzones while working as contractors. With the bad rep of Blackwater, it’s proof of T’s herculean reporting and writing efforts that he got lawmakers to make some real moves into an easily overlooked (for political reasons) but essential area of our national security (in terms of prizes though, T already brought home the Selden Ring).

And of course, all those stories above would’ve had a harder hill to climb without the collaboration of all our great editors and research staff. And in my own department, Krista Kjellman and Jeff Larson put in just as much dedication and deliberation to further illuminate the stories in their online presentation (and in the process, often provided research and work important to the stories themselves).

Congrats to the other Pulitzer winners. I haven’t had time to look through all their work. I did put WaPo’s Gene Weingarten’s winning feature on the hellish punishment of parents who left their children to die in overheated cars on my iPad’s Instapaper. I got about a fourth-way through before I had to put it away so I wouldn’t be crying in the subway car.

Coding for Journalists 101 : A four-part series


Photo by Nico Cavallotto on Flickr

Update, January 2012: Everything…yes, everything, is superseded by my free online book, The Bastards Book of Ruby, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you’ll find here.

I’m only keeping this old walkthrough up as a historical reference. I’m sure the code is so ugly that I’m not going to even try re-reading it.

So check it out: The Bastards Book of Ruby


Update, Dec. 30, 2010: I published a series of data collection and cleaning guides for ProPublica, to describe what I did for our Dollars for Docs project. There is a guide for Pfizer which supersedes the one I originally posted here.

So a little while ago, I set out to write some tutorials that would guide the non-coding-but-computer-savvy journalist through enough programming fundamentals so that he/she could write a web scraper to collect data from public websites. A “little while” turned out to be more than a month-and-a-half. I actually wrote most of it in a week and then forgot about. The timeliness of the fourth lesson, which shows how to help Pfizer in its mission to more transparent, compelled me to just publish them in incomplete form. There’s probably inconsistencies in the writing and some of the code examples, but the final code sections at the end of each tutorial do seem to execute as expected.

As the tutorials are aimed at people who aren’t experienced programming, the code is pretty verbose, pedantic, and in some cases, a little inefficient. It was my attempt to think how to make the code most readable, and I’m very welcome to editing changes.

DISCLAIMER: The code, data files, and results are meant for reference and example only. You use it at your own risk.