-
My name is Dan Nguyen and I'm a journalist, programmer and photographer.
Follow me on Twitter: @dancow
Read my guide to programming: The Bastards Book of Ruby.
Category Archives: works
The Most Viewed Portraits: Marina Abramović: The Artist Is Present, the MOMA
Thought it’d be fun to see the 200 most viewed portraits on the MOMA’s “Marina Abramović: The Artist Is Present” Flickr set. So I wrote a scraper to collect each portrait’s stats, including page views. A number of celebrities participated… Continue reading
Marina Abramović Melts Before Your Eyes
Wrote a quick scrape of the Museum of Modern Art’s gallery of Marina Abramović’s “The Artist is Present”. This is Abramovic’s portrait for the last 68 days (I guess the upload isn’t complete yet). (Update, all 72 days are up,… Continue reading
Pfizer Data Redux
Updated the code and results to my guide on how to scraper Pfizer’s list of payments to doctors. It now contains a more normalized file that has a line for every doctor and payment. The aggregate totals changed marginally.
Coding for Journalists 101 : A four-part series
Update, January 2012: Everything…yes, everything, is superseded by my free online book, The Bastards Book of Ruby, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you’ll… Continue reading
Discussion
24 Comments
Category works
Tags coding, journalism, pfizer, programming, ruby, tutorial, web scraping
Coding for Journalists 104: Pfizer’s Doctor Payments; Making a Better List
Update (12/30): So about an eon later, I’ve updated this by writing a guide for ProPublica. Heed that one. This one will remain in its obsolete state. Update (4/28): Replaced the code and result files. Still haven’t written out a… Continue reading
Coding for Journalists 103: Who’s been in jail before: Cross-checking the jail log with the court system; Use Ruby’s mechanize to fill out a form
This is part of a four-part series on web-scraping for journalists. As of Apr. 5, 2010, it was a published a bit incomplete because I wanted to post a timely solution to the recent Pfizer doctor payments list release, but… Continue reading
Discussion
4 Comments
Category works
Tags coding, courts, journalism, mechanize, programming, ruby, tutorial
Coding for Journalists 102: Who’s in Jail Now: Collecting info from a county jail site
This is part 2 of a 4-part series in introductory coding for journalists. Go here for the first lesson. This lesson and code will still be verbose, but will have a lot less hand-holding than the previous one.
Coding for Journalists 101: Go from knowing nothing to scraping Web pages. In an hour. Hopefully.
UPDATE (12/1/2011): Ever since writing this guide, I’ve wanted to put together a site that is focused both on teaching the basics of programming and showing examples of practical code. I finally got around to making it: The Bastards Book… Continue reading
Discussion
21 Comments
Category works
Tags html, journalism, programming, ruby, tutorial, web scraping
Day of the Tiger: How Newspapers, Networks, and News Aggregators Played Tiger Woods on Friday
On Friday, golfer Tiger Woods held a TV appearance to talk about life after marital problems. At around 2:30 p.m., I screen capped some of the websites for some of the largest news organizations and aggregators. Today, I looked at the screen-caps, cropped them to the top 1600 pixels, and marked in green the areas of the pages devoted to Woods coverage (or related coverage, such as “Slideshow: Top 10 Adultery Confessions).
Even three hours after what was generally considered a highly-scripted 10 minutes of non-revelations (the Golf Writers Association of America boycotted it), Woods pretty much dominated the most visible spaces on general news websites. Of the major American news organizations, CNN probably had the most real estate devoted to Tiger; New York Times, the least. Both Drudge Report and Huffington Post had Woods as the lede. Asian publications (the few that I could read) gave little space.
Among social/computerized news aggregators, Google News gave Woods front placement…unsurprising considering its algorithm is driven by what news organizations have. Neither Reddit and Digg had any mention of it in their news sub-sections. This may be a little unfair, as these sites’ users may be me more strict in keeping all sports-related news strictly in their sports subsections. But I did check their frontpages (which are the top stories from all the major sections) and Tiger didn’t make an appearance.
Continue reading
ProPublica tracks the bailout, a year or so later
Today, my ProPublica colleague Paul Kiel and I put out some graphical revisions to PP’s bank bailout tracking site, including our master list of companies to get taxpayer bailout money: Nothing fancy, mostly made the numbers easier to find and… Continue reading
Discussion
Leave a comment
Category works
Tags bailout, banks, money, propublica, ruby on rails, treasury, web design, website design