Coding for Journalists 101: Go from knowing nothing to scraping Web pages. In an hour. Hopefully.

UPDATE (12/1/2011): Ever since writing this guide, I’ve wanted to put together a site that is focused both on teaching the basics of programming and showing examples of practical code. I finally got around to making it: The Bastards Book of Ruby.

I’ve since learned that trying to teach the fundamentals of programming in one blog post is completely dumb. Also, I hope I’m a better coder now than I was a year and a half ago when I first wrote this guide. Check it out and let me know what you think:

http://ruby.bastardsbook.com

Someone asked in this online chat for journalists: I want to program/code, but where does a non-programmer journalist begin?

My colleague Jeff Larson gave what I believe is the most practical and professionally-useful answer: web-scraping (jump to my summary of web-scraping here, or read this more authorative source).

This is my attempt to walk someone through the most basic computer science theory so that he/she can begin collecting data in an automated way off of web pages, which I think is one of the most useful (and time-saving) tools available to today’s journalist. And thanks to the countless hours of work by generous coders, the tools are already there to make this within the grasp of a beginning programmer.

You just have to know where the tools are and how to pick them up.

Click here for this page’s table of contents. Or jump to the the theory lesson. Or to the programming exercise. Or, if you already know what a function and variable is, and have Ruby installed, go straight to two of my walkthroughs of building a real-world journalistic-minded web scraper: Scraping a jail site, and scraping Pfizer’s doctor payment list.

Or, read on for some more exposition:

Continue reading

The WikiLeaks Hellfire Video vs. Video Games

The WikiLeaks release of classified U.S. military video depicting American helicopters gunning down Iraqis (which appeared to include children and two Reuters staff) was easily a milestone of modern journalism. Even though Reuters had reported the story aggressively, the deaths of Namir Noor-Eldeen and Saeed Chmagh were easily forgotten amid the war’s constant newscycle in 2007.

The video below, combined with boots-on-the-ground reporting by WikiLeaks, has an unmatched power to shock, awe, and sicken:

A side-angle to all this is how chillingly-similar the released video is to today’s video games. This was a point Jane Mayer touched on in her excellent New Yorker piece on Obama’s increased usage of Predator drones:

Using joysticks that resemble video-game controls, the reachback operators—who don’t need conventional flight training—sit next to intelligence officers and watch, on large flat-screen monitors, a live video feed from the drone’s camera. From their suburban redoubt, they can turn the plane, zoom in on the landscape below, and decide whether to lock onto a target. A stream of additional “signal” intelligence, sent to Langley by the National Security Agency,* provides electronic means of corroborating that a target has been correctly identified. The White House has delegated trigger authority to C.I.A. officials, including the head of the Counter-Terrorist Center, whose identity remains veiled from the public because the agency has placed him under cover.

People who have seen an air strike live on a monitor described it as both awe-inspiring and horrifying. “You could see these little figures scurrying, and the explosion going off, and when the smoke cleared there was just rubble and charred stuff,” a former C.I.A. officer who was based in Afghanistan after September 11th says of one attack. (He watched the carnage on a small monitor in the field.) Human beings running for cover are such a common sight that they have inspired a slang term: “squirters.”

Read more: http://www.newyorker.com/reporting/2009/10/26/091026fa_fact_mayer#ixzz0kIeSEK1h

The striking similarity inspired some soul-searching from this Redditor:

After watching the wikileaks video I found myself thinking back to the aerial segments of Modern Warfare and MW2. I’m not sure I’d want to play them again; the anonymity of the people you’re shooting seems a little too true to life for me.

Modern Warfare and MW2 are part of the highly-successful Call of Duty first-person shooter video games. One of the segments has the players manning an AC-130 Spectre gunship to wipe out the enemy:

Some interesting comments from Redditors on that angle:

a_culther0 22 points 9 hours ago[-]
I always believed the main point of those levels in the game was to illustrate that certain things in Modern War can be achieved with the push of a button. The AC-130 level in COD4 has essentially 0 difficulty; which in my eyes makes an excellent statement on its own.
permalinkparentreportreply

awills 15 points 9 hours ago[-]
This is also how I read this scene. It’s actually the most realistic depiction of war in the entire game, because what you’re doing is remarkably similar to what it would be like in real life, just aiming at tiny targets and destroying them. Interestingly, it was also the most distancing from the actual results of your actions.

bumrushtheshow 35 points 9 hours ago[-]
…The grainy TV footage and shooting at tiny people made me question more than usual what the hell I was doing. I was blowing up “bad” people who looked exactly like the “good” people. I was clearly in the “bad” people’s country, with only vague justifications for why I was there blowing the place up.
I’ve seen some awful footage from Apache gun cams on Youtube. Ones where maimed “bad guys” crawl out of a burning truck, while hillbillies say “that one’s still moving, hit ‘im again” left me literally feeling nauseous. I thought of these throughout the AC-130 level in COD4.

iPad: First Impressions

iPad at Fat Cat

As if chess at a bar wasn't geeky enough. I think we raised the bar at West Village's Fat Cat.

So I plunked $580 for the 16GB wireless iPad (that includes tax and the Apple case), plus about $60-$80 more on apps. I already have a laptop, a blackberry, and a netbook…but I justified this luxury purchase because I do think that tablet computing (whether or not the iPad leads it) will be the next boom in computer usage, and I’d like to at least be aware of it.

And I like doodling and watching movies in bed.

That said, I didn’t have a strong response for everyone who asked “How is this different than a bigger iPhone?” There’s not much difference, actually, but I think we have yet to see what touch-computing can offer, and it’s a decidedly different experience than a traditional laptop. I’ve owned a MSI Wind for a year now and barely use it, unless I need to pack a laptop when I’m carrying camera equipment around. I’ve already played more games and watched more movies in the past two days on the iPad than I have on my netbook, it just seems better suited for it. I think the lack of a attached keyboard makes for a fundamentally-different experience as least as significant as the iPod Touch (and to some extent, the iPhone) over every other mp3 player before it.

Brushes pic

One of my first attempts on Brushes

I’m not much of an artist, but I’ve always wanted to just sketch for fun without the hassle of buying and maintaining art supplies. A mouse and Illustrator just doesn’t do it for me, nor would a tablet connected to a laptop. Brushes has been one of my favorite apps so far.

I’ve mostly stopped playing video games. I downloaded a few of the marquee titles, including Real Racing, and barely touched them after a few minutes. But the more social, multiplayer games, like Flight Control and the various board games, were really entertaining when the bar we were at, Fat Cat, didn’t have the games we wanted.

As far as productivity…I haven’t used it at all for anything meaningful. When I was sitting around in my living room with both the laptop and iPad, I still, out of habit, switched to my laptop to do even just regular browsing. Typing is a wrist-killer…and touch-navigating the web is still cumbersome.

Didn’t do much reading, but I like that one of the free apps allows you to download classics like Alice in Wonderland for free, with pretty decent, readable text. I still do most of my New York Times reading on my Blackberry as I’m waiting in line or at the subway.

I think I’ll keep the iPad for now…Selling it while people still think it’s cool is still a possibility…but I see a lot of potential in it so far. But I wholeheartedly agree with Kotaku’s Mike Fahey: I feel like an asshole for owning an iPad and don’t feel comfortable using it in public, yet. That kind of reduces the device’s utility…For now, I’m keeping the plastic wrap on it that it came with.

Some other notes:

Cons:

  • I synced up my iPad just now for the first time with my laptop. I took awhile to figure out how to transfer photos from my laptop to the iPad (using the not-so-visible Photos tab in iTunes, and then having to create a special folder on my computer with duplicate copies of photos). And while it was doing that, it decided to delete all the apps from the iPad. The file management on the iPad, as it has been with the iPod, is fucking stupid, and possibly the worst part of any iProduct. I stopped using Sony products because of their proprietary – and generally inferior – formats (a $200 voice recorder I bought years ago is useless because Sony no longer produces/updates the software to access its files). I hope Apple doesn’t go the same path.
  • Apps were generally buggy. Netflix crashed many times.
  • Still haven’t figured out how to comfortably type.
  • Takes awhile to charge up the battery.
  • It’s hard to find a good Chess or Cards game…either they have hotseat-multiplayer or computer AI, rarely both.
  • Yeah, it is a bit heavy to not be resting on your lap.

Pros

  • Touch-interface is as solid as it is on the iPod Touch.
  • Being able to lockdown the screen rotation is great.
  • Lots of decent free apps. My favorite so far are craigsphone (craigslist on the pad), the NYT editor’s choice, Netflix, and Free Books
  • The launch games have been pretty good, including Flight Control, Minigore, Real Racing
  • Netflix streaming on my nightstand is great. Finally, I’ll finish 30 Rock.

The Pope as a Manager, part II

Under siege, the Vatican hits back at the New York Times for its coverage of Pope Ratzinger and his past (in)actions as the Church’s doctrinal chief. As I wrote previously, it can’t be much comfort to miracle-believers that the Vatican’s excuse is that crimes most heinous weren’t stopped because the paperwork got lost/ignored in the system.

William J. Levada, the American cardinal who now heads the doctrinal office, had this defense:

Anyone can say, ‘Why didn’t you do this?’ ‘You could have done this better.’ That’s part of life, but certainly it’s not the case to say that he is deficient,” Cardinal Levada said. “If anything, he was the architect of this step forward in the church and I think he deserves his credit.”

I’m not sure what the requirements are to be Pope; the apostle Peter was famously deficient but still made the cut. But it’s hard to blame people for thinking that Ratzinger should’ve shown more moral force than what documents reveal. Levada blames the Father Murphy scandal on the slow actions of the Wisconsin church, but the NYT’s documents argue that it was the Milwaukee bishop who had to tell the Vatican that Murphy’s alleged molesting of 200 boys required more than just prayer and a restriction of sacraments.

Ironically, Ratzinger is credited for being far more aggressive in dealing with the sex abuse crisis than Pope John Paul II; so did JPII, who really was God’s man during the time of the scandal, avoid judgment (he’s currently in running to be a saint) because he passed away in time? The more the Vatican argues that Ratzinger should be commended for taking action, the more it implies that inaction took place under JPII.

David Brooks: Maybe Sandra Bullock should’ve stayed in the kitchen

So David Brooks in the NYT, using an almost-current event (Sandra Bullock winning the Oscars, then getting humiliated by hubby Jesse James) takes another (not half-bad) try at an argument that feminists might characterize as “Maybe women would be happier if they focused less on their career and more on their man and family”:

Two things happened to Sandra Bullock this month. First, she won an Academy Award for best actress. Then came the news reports claiming that her husband is an adulterous jerk. So the philosophic question of the day is: Would you take that as a deal? Would you exchange a tremendous professional triumph for a severe personal blow?

Nonetheless, if you had to take more than three seconds to think about this question, you are absolutely crazy. Marital happiness is far more important than anything else in determining personal well-being.

To be fair, Brooks doesn’t explicitly focus on wives and their role in gluing a family together, it just happens that Bullock is a woman…but it’s hard to not accuse Brooks of patriarchy when he makes this an either-or situation, as if the only two options for Sandra to choose between are “Win an Oscar” or “Have a faithful husband”…ignoring the fact that it was Jesse James who made a trade between being true to his Oscar-winning wife or bonking some tattooed-bimbo. And, completely ignoring Tiger Woods, who really did choose a world-famous career (and the attendant porn stars that come with it) over his family.

Even aside from that, Brooks’ try at the “money and power isn’t everything” philosophy opened up the conservative Brooks to a zinger, based on a more-currenter-event, from the comments:

B. Starks, Austin, TX: Mr. Brooks, great argument for ensuring health care for all and legalizing gay marriage. I imagine your conservative allies will not see it this way, but the facts noted in the column could be used to shore up both positions, and I hope they are indications you are in favor of both.

The banality of Godliness: The Vatican and Sex Scandals and a Slow Mail System

Ahhh, yeahhhh...Did you get the memo?

Ahhh, yeahhhh...Did you get the memo?

Ross Douthat, the cherubic Catholic on the NYT’s column head, tries to take a stab at his church and its recent spate of sex scandal revelations:

There has been some accountability for the abusers, but not nearly enough for the bishops who enabled them. And now the shadow of past sins threatens to engulf this papacy.

Popes do not resign. But a pope can clean house. And a pope can show contrition, on his own behalf and on behalf of an entire generation of bishops, for what was done and left undone in one of Catholicism’s darkest eras.

This is Holy Week, when the first pope, Peter, broke faith with Christ and wept for shame. There is no better time for repentance.

Douthat is rightfully taking flak from commenters for trying to blame some of the scandal on the “silly season of the ’70s”, as if disco clubs and second-wave feminism were gateways for priests to molest boys in the confessional (as one commenter points out, the moral progressivism of the 70s, which presumably weakened the Church’s influence, may have helped give victims an opening to speak out).

But if I were to be the devil’s advocate, I’d point out this passage in Douthat’s column as one to raise doubts about the holiness of the Catholic church:

There are two charges against Benedict XVI: first, that he allowed a pedophile priest to return to ministry while archbishop of Munich in 1980; and second, that as head of the Vatican’s Congregation for the Doctrine of the Faith in the 1990s, he failed to defrock a Wisconsin priest who had abused deaf children 30 years before.

The second charge seems unfair. The case was finally forwarded to the Vatican by the archbishop of Milwaukee, Rembert Weakland, more than 20 years after the last allegation of abuse.

One of the supposed upsides of having a papacy is that the Pope is presumably the supreme decider, as direct a link to God’s authority as we can have on earth. He definitively settles doctrinal and moral matters, and his words and his mind are pretty much in tune with God’s, and no one who believes in the supremacy of the Catholic Church should have any doubt otherwise.

So, basically, one of Douthat’s (and the official Vatican spokesperson’s) defenses is that, well, the man who was destined to be God’s infallible voice, didn’t stop a man (Father Murphy in Wisconsin) from molesting 200 deaf boys because the relevant memo didn’t get to him (or to the Pope at the time) soon enough.

Father Lawrence Murphy, in a flyer distributed by his accusers

Father Lawrence Murphy, in a flyer distributed by his accusers


This Times interactive timeline about how Father Murphy got away is worth clicking through. Here are some of the important dates (after the last known-allegations against him):

– In Dec. 1993, Father Murphy is evaluated by the archdiocese.

– Three years later, the Milwaukee archbishop gets around to writing a letter to Ratzinger’s office.

– About half a year later, the pre-trial proceedings get held up because they’re arguing over the statute of limitations (which, according to one interpretation of canon law, is as short as 30 days).

About a year later from when the Milwaukee archbishop’s wrote a letter to Ratzinger, it finally is received by whoever is supposed to forward mail to Ratzinger’s office.

– Another year passes, and Ratzinger’s secretary recommends that Father Murphy be allowed to live the rest of his years in dignity. In the July of 1998, the Vatican sends its meeting notes to Wisconsin…and it takes about a month for them to get translated from Italian to rough English.

– In August, the Milwaukee Archdiocese puts Murphy out to pasture, promising, with almost hilarious understatement, that he plans on “strengthening the precepts that have already been placed upon Father Murphy…to assure that Father Murphy does not continue to seek contact with members of the deaf community, which often in the past has resulted in considerable dismay in the deaf community.”

– A few weeks later, Father Murphy dies. In defiance of church orders, his family gives him an open-casket funeral, with him decked out in full vestments, with invites sent out to the deaf community.

The most passionate of anti-Catholics would argue that the Church was actively covering up, and maybe even encouraging the abusive behavior of its pedophile priests. But it’s understandable how even an ardent Catholic, after reading the above-trove of documents, might conclude that the Vatican may not be covering up for predators, but it sure is dependent on an all-too human, painfully-slow bureaucracy, in which church officials spend as much time arguing over interpretation of church rules as they do criminal law and important letters are being sent over by courier in diplomatic pouches, and yet still take a year to get to the relevant official’s secretary (in another case of an abusive priest, the memo never got forwarded at all, according to the Vatican).

And even when it does reach the right official, Ratzinger in this case, there’s no guarantee that there’s enough hours in the day for him to get to it (though he did find time to punish and force a priest out of the priesthood for participating in a peace protest). If only Skype had been invented back then, maybe Father Murphy would have been punished before he was too old, as the church judged, to deserve the indignity of a trial.

I think most mature believers come to realize that, for the most part, God shouldn’t be expected to deliver miracles in quite the same immediate and dramatic fashion as depicted in the Bible. But after the case of the future Pope Ratzinger and Father Murphy, now young believers have to accept that not only will God (and his human proxy) not strike down the most evil of sinners (Matt 18:6: but whoever causes one of these little ones who believe in Me to stumble, it would be better for him to have a heavy millstone hung around his neck, and to be drowned in the depth of the sea)…He may not even make sure that the memo gets forwarded.

The other alternative is that Ratzinger (and the Pope before him) did receive and read the memos, and then did nothing. The Vatican and Douthat don’t have much wiggle room between trampling on Catholic and Christian theology, and offending basic human sensibilities.