<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dan Nguyen pronounced fast is danwin &#187; Dan Nguyen</title>
	<atom:link href="http://danwin.com/author/admin/feed/" rel="self" type="application/rss+xml" />
	<link>http://danwin.com</link>
	<description>The &#039;g&#039; is mostly silent</description>
	<lastBuildDate>Sat, 12 May 2012 02:01:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Louis C.K. releases new $5 DRM-free comedy recordings: Carnegie Hall (2010) and Shameless (previously on HBO)</title>
		<link>http://danwin.com/2012/05/louis-c-k-releases-new-5-drm-free-comedy-recordings-carnegie-hall-2010-and-shameless-previously-on-hbo/</link>
		<comments>http://danwin.com/2012/05/louis-c-k-releases-new-5-drm-free-comedy-recordings-carnegie-hall-2010-and-shameless-previously-on-hbo/#comments</comments>
		<pubDate>Sat, 12 May 2012 01:57:01 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1974</guid>
		<description><![CDATA[I just got a mass-email from Louis C.K., apparently sent to everyone who bought his $5 Beacon Theater show. He&#8217;s offering audio-recordings of two previous shows with the popular $5-no-damn-DRM price. The email itself is hilarious as well. Here is&#8230; <a href="http://danwin.com/2012/05/louis-c-k-releases-new-5-drm-free-comedy-recordings-carnegie-hall-2010-and-shameless-previously-on-hbo/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I just got a mass-email from Louis C.K., apparently sent to everyone who bought his <a href="http://danwin.com/2011/12/louis-c-k-thanks-his-fans-for-buying-his-5-beacon-theater-show/">$5 Beacon Theater show</a>. He&#8217;s <a href="http://louisck.com">offering audio-recordings of two previous shows with the popular $5-no-damn-DRM price</a>. The email itself is hilarious as well.</p>
<p>Here is what it says for those who aren&#8217;t on the mailing list:<br />
&#8212;&#8212;&#8211;</p>
<p>Hello there. I am Louis C.K. for now. You are a person who opted into my email list, when you bought my Live at the Beacon standup special. As I promised, I have left you alone for a long time. Well, those days are over. I am writing now to let you know that I am offering some more stuff on my site, which you are more than welcome to buy. What does &#8220;More than welcome&#8221; mean? Well, it means you can totally buy this stuff. Like, totally. </p>
<p>Okay so there are two new products. They are both audio comedy specials. One is called&#8230;<br />
Louis CK: WORD &#8211; Live at Carnegie Hall</p>
<p>This is about an hour long and it&#8217;s a recording of a live standup show that I did at Carnegie Hall in November of 2010 as part of a national tour I was on entitled &#8220;WORD&#8221; I&#8217;ve had a lot of requests from people to release that show as a speical or as a CD. I hadn&#8217;t done so because a lot of the material that I did on the WORD tour, was in the second season of my show &#8220;LOUIE&#8221; on FX. But I decided since it&#8217;s never been released as an entire show, and some of the material was not on my show, I&#8217;m releasing this now. I&#8217;m giving you this long and boring explaination because, as most of you know, I release about an hour or more of new standup material every year and folks can count on seeing a new show every year. This is old material, so I don&#8217;t want to be a dick and pretend it isn&#8217;t. </p>
<p>Anyway, Louis CK: WORD &#8211; Live at Carnegie hall is available for the same 5 dollars as everything will be on louisck.com. It is the same deal as before that you get 4 downloads and the file is drm free. YOu can burn it onto a CD, play it on your ipod, whatever you want. The special is broken up into separate tracks because I think that&#8217;s more fun for a comedy album, but they are all just one thing you buy all at one time. </p>
<p>The second new thing is even older, actually. It&#8217;s an audio release of &#8220;Shameless&#8221;, my very first hour long standup special that I did for HBO. It was never released as an audio CD, so I asked HBO to let me offer it on this site and they agreed. They also agreed to let me offer it, the same as the rest, DRM free, for 5 dollars. Obviously I&#8217;ll be sharing the Shameless money with HBO but I think it&#8217;s pretty cool that they&#8217;re letting this be out there unprotected like this. Shameless is also 5 dollars, drm free, and you can download it a bunch of times for the price. </p>
<p>Lastly, I&#8217;m offering Live at the Beacon Theater as an audio version, for those many of you who have asked for it. This is just exactly an audio version of the video special. Those of you who have already bought Live at the Beacon theater already own this. If you just return to the site louisck.com with your password, it is now live and available for you to download at no extra cost. Those of you who now buy LIve at the beacon theater for 5 dollars, will also have the audio version availbable to you. It&#8217;s simply been added to the video downloads and streams you already were getting. </p>
<p>Later, I am going to make a version of Live at the Beacon theater, that is a separate audio special, which will be much longer. That will cost money. Because I&#8217;m an asshole. But that&#8217;s later.</p>
<p>Also later, actually soon, I&#8217;ll be putting my first feature film &#8220;Tomorrow Night&#8217; up for sale on the site. And also other things. Soon. For now. Please feel free to click on the button below, to purchase some of the new stuff, using Paypal or Amazon payments, we now accept both. Or go to louisck.com and peruse the new items. I think we have some samples there that you can check out. </p>
<p>You may have noticed that Louis CK LIve at the beacon theater is airing on the FX network. FX agreed to air it 10 times over the next few months. The version on FX is only 42 minutes long and we had to take out the fucks. The reason I chose to air the special on FX is that FX is my people. They gave me my show LOUIE (season 3 premieres on June 28th at 10:30pm) and they have never aired a standup special. So I thought it would be cool to let them air it and bring more people to the site who want to get the complete unexpurgated version. Also FX doesn&#8217;t make me cut things for content. Just the big words (fuck, etc) </p>
<p>Okay. that was exhausting. Sorry. I didn&#8217;t even ask you how you are. How are you? Oh yea? Oh good. That&#8217;s great. What? Oh man. That&#8217;s tough. I&#8217;m sorry&#8230; Oh well that sounds like you handled it well, though. So. Yeah. Yeah. I know. I know that&#8217;s&#8230; yeah. Well&#8230; Just remember, time will go by and that&#8217;ll just be on the list of shit that happened to you. You&#8217;ll be okay. Yeah. Huh?&#8230; Oh. Really? HE DID? Oh my GOD! hahaha!! That&#8217;s CRAZY! No. no. I won&#8217;t tell him you told me. Of course not. Alright well&#8230; uhuh? Oh wow. yeah. Alright well.. I really gotta go. Thanks for listening. I&#8217;m glad you&#8217;re basically okay. Stay in touch. </p>
<p>your friend,</p>
<p>Louis C.K.</p>
<p>&#8212;-</p>
<p>The shows are available at his site, <a href="http://louisck.com">louisck.com</a>. If you missed the email he sent out after his huge success with independently releasing his Beacon Theater show, <a href="http://danwin.com/2011/12/louis-c-k-thanks-his-fans-for-buying-his-5-beacon-theater-show/">here it is.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/05/louis-c-k-releases-new-5-drm-free-comedy-recordings-carnegie-hall-2010-and-shameless-previously-on-hbo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A HTML GUI for training Tesseract on character sets</title>
		<link>http://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/</link>
		<comments>http://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/#comments</comments>
		<pubDate>Thu, 10 May 2012 20:02:27 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[works]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[tesseract]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1971</guid>
		<description><![CDATA[The Tesseract OCR Chopper, by data journalist Dino Beslagic. I&#8217;m making this short stub post because ever since I&#8217;ve used tesseract to convert scanned documents into text, I&#8217;ve wondered why the hell is it so hard to train tesseract (to&#8230; <a href="http://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://pp19dd.com/tesseract-ocr-chopper/">Tesseract OCR Chopper</a>, by data journalist <a href="http://pp19dd.com/things/about/">Dino Beslagic</a>.</p>
<p>I&#8217;m making this short stub post because ever since I&#8217;ve used <a href="http://code.google.com/p/tesseract-ocr/">tesseract</a> to convert scanned documents into text, I&#8217;ve wondered why the hell is it so hard to train tesseract (to make it better at recognizing a font)? As it turns out, <a href="http://pp19dd.com/tesseract-ocr-chopper/">Beslagic created a web-app that makes the task comparatively easy and platform-independent</a>.</p>
<p>He recently updated it but posted it about 2 years ago. I can&#8217;t believe I didn&#8217;t find it until now. How did I find it? By stumbling upon the <a href="http://code.google.com/p/tesseract-ocr/wiki/AddOns">&#8220;AddOns&#8221; wiki</a> for the Tesseract project. I love Tesseract but am surprised at how such a useful and popular utility can have such scattered resources.</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/05/a-html-gui-for-training-tesseract-on-character-sets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kurt Vonnegut&#8217;s brilliant, brief career at Sports Illustrated: &#8220;He was not good at being an employee&#8221;</title>
		<link>http://danwin.com/2012/05/kurt-vonneguts-brilliant-brief-career-at-sports-illustrated-he-was-not-good-at-being-an-employee/</link>
		<comments>http://danwin.com/2012/05/kurt-vonneguts-brilliant-brief-career-at-sports-illustrated-he-was-not-good-at-being-an-employee/#comments</comments>
		<pubDate>Thu, 10 May 2012 12:48:00 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1964</guid>
		<description><![CDATA[Slaughterhouse Five is one of my all-time favorite books. But I hadn&#8217;t known that Vonnegut was also one of the finest sportswriters to have graced equestrianism: From the introduction &#8211; written by his son, Mark: &#8211; to his posthumous work,&#8230; <a href="http://danwin.com/2012/05/kurt-vonneguts-brilliant-brief-career-at-sports-illustrated-he-was-not-good-at-being-an-employee/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><div id="attachment_1966" class="wp-caption aligncenter" style="width: 460px"><img src="http://danwin.com/words/wp-content/uploads/2012/05/vonnegut-random-house.jpeg" alt="" title="vonnegut-random-house" width="450" height="388" class="size-full wp-image-1966" /><p class="wp-caption-text">Kurt Vonnegut (photo courtesy of Random House)</p></div><br />
Slaughterhouse Five is one of my all-time favorite books. But I hadn&#8217;t known that Vonnegut was also one of the finest sportswriters to have graced equestrianism:</p>
<p>From the <a href="http://www.npr.org/templates/story/story.php?storyId=89276309">introduction &ndash; written by his son, Mark: &ndash; to his posthumous work</a>, <em>Armageddon in Retrospect</em>:</p>
<blockquote><p>
He often said he had to be a writer because he wasn&#8217;t good at anything else. </p>
<p>He was not good at being an employee. </p>
<p>Back in the mid-1950s, he was employed by Sports Illustrated, briefly. He reported to work, was asked to write a short piece on a racehorse that had jumped over a fence and tried to run away. Kurt stared at the blank piece of paper all morning and then typed, &#8220;The horse jumped over the fucking fence,&#8221; and walked out, self-employed again.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/05/kurt-vonneguts-brilliant-brief-career-at-sports-illustrated-he-was-not-good-at-being-an-employee/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Valve&#8217;s New Employees Handbook: &#8220;What is Valve *Not* Good At?&#8221;</title>
		<link>http://danwin.com/2012/04/valves-new-employees-handbook-what-is-valve-not-good-at/</link>
		<comments>http://danwin.com/2012/04/valves-new-employees-handbook-what-is-valve-not-good-at/#comments</comments>
		<pubDate>Sat, 21 Apr 2012 14:23:03 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>
		<category><![CDATA[Valve]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1960</guid>
		<description><![CDATA[A copy of gaming company Valve&#8217;s new employee guide made the rounds on Hacker News this morning (read the discussion here). Of all such company-manifestos, Valve&#8217;s ranks as one of the most well-design, brightly-written, and astonishingly honest. Google has its&#8230; <a href="http://danwin.com/2012/04/valves-new-employees-handbook-what-is-valve-not-good-at/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div id="attachment_1962" class="wp-caption aligncenter" style="width: 550px"><a href="http://cdn.flamehaus.com/Valve_Handbook_LowRes.pdf"><img src="http://danwin.com/words/wp-content/uploads/2012/04/valve-guide-methods-communication.jpg" alt="Valve&#039;s New Employee Guide: Methods to find out what&#039;s going on" title="Valve&#039;s New Employee Guide: Methods to find out what&#039;s going on" width="540" height="529" class="size-full wp-image-1962" /></a><p class="wp-caption-text">Valve admits that one of its weaknesses is internal communication. So its new employee guide provides a helpful illustration of how to stay in the loop.</p></div>
<p>A copy of <a href="http://cdn.flamehaus.com/Valve_Handbook_LowRes.pdf">gaming company Valve&#8217;s new employee guide</a> made the rounds on Hacker News this morning (read the discussion here). Of all such company-manifestos, Valve&#8217;s ranks as one of the most well-design, brightly-written, and astonishingly honest.</p>
<p>Google <a href="http://www.google.com/jobs/lifeatgoogle/englife/index.html">has its 20-percent-time policy</a>, Valve&#8217;s is 100 percent: </p>
<blockquote><p>
We’ve heard that other companies have people allocate a percentage of their time to self-directed projects. At Valve, that percentage is 100.</p>
<p>Since Valve is flat, people don’t join projects because they’re told to. Instead, you’ll decide what to work on after asking yourself the right questions (more on that later). <strong>Employees vote on projects with their feet (or desk wheels).</strong> </p>
<p>Strong projects are ones in which people can<br />
see demonstrated value; they staff up easily. This means there are any number of internal recruiting efforts constantly under way.</p></blockquote>
<p>To be fair, Google&#8217;s policy ostensibly allows that 20 percent time to be directed at non-company-boosting projects. It&#8217;s likely there is some internal mechanism/dynamic that prevents Valve malcontents from going too far off the ranch.</p>
<p>With the attention that Valve puts into just their guide, they&#8217;re obviously betting that their hiring process finds the talent with the right attitude. They describe the model employee as being &#8220;T-shaped&#8221;: skilled in a broad variety of talents and peerless in their narrow discipline.</p>
<p>One of the best sections comes at the end, under the heading &#8220;<strong>What is Valve <em>Not</em> Good At?</strong>&#8221; This is the classic opportunity to do the humblebrag, as when it comes up during hiring interviews (&#8220;My greatest weakness is that I&#8217;m too passionate about my work!&#8221;). Valve&#8217;s list of weaknesses are not harsh or odious &ndash; if you like what they&#8217;ve opined in the guide, then these weaknesses logically follow:</p>
<ul>
<li>Helping new people find their way. We wrote this book to help, but as we said above, a book can only go so far. [<strong>My</strong> reading between the lines: <em>the people we seek to hire are intelligent and experienced enough to navigate unknown territory</em>]</li>
<li>Mentoring people. Not just helping new people figure things out, but proactively helping people to grow in areas where they need help is something we’re organizationally not great at. Peer reviews help, but they can only go so far. [<em>our "T" shaped employees were hired because they are good at a lot of things and especially good at one thing. Presumably, they have enough of a "big picture" mindset to realize how they became an expert in one area, why they chose to become good at it, what it takes to get there, and a reasonable judgment of cost versus benefit </em>]</li>
<li>Disseminating information internally [<em>Since we're a flat organization, it is incumbent on each team member to proactively keep themselves in the loop</em>].</li>
<li>Finding and hiring people in completely new disciplines (e.g., economists! industrial designers!)[<em>what can you say, we started out primarily as a gaming company and were so good at making games that we apparently could thrive on that alone</em>]. </li>
<li>Making predictions longer than a few months out [<em>team members and group leaders don't fill out enough TPS reports for us to keep reliable Gantt charts. Also, having set-in-stone deadlines and guidelines can restrict mobility</em>].</li>
<li>We miss out on hiring talented people who prefer to work within a more traditional structure. Again, this comes with the territory and isn’t something we should change, but it’s worth recognizing as a self-imposed limitation.</li>
</ul>
<p>All of Valve&#8217;s weaknesses can be spun positively, but they would legitimately be critical weaknesses in a company with a differing mindset. For anyone who has read through the <a href="http://cdn.flamehaus.com/Valve_Handbook_LowRes.pdf" title="">entire guide</a>, these bullet points are redundant. But it&#8217;s an excellent approach for doing a concluding summary/tl;dr version (in fact, it reminds me of <a href="http://danwin.com/2011/12/to-find-insights-ask-the-cage-cleaners-not-the-veterinarians/">the pre-mortem tactic</a>: asking team members before a project&#8217;s launch to write a future-dated report describing why the project became a disaster. It reveals problems that should&#8217;ve been discovered during the project&#8217;s planning phases, but in a fashion <em>that rewards employees for being critical</em>, rather than seeing them as negative-nancies). </p>
<p>Read the <a href="http://cdn.flamehaus.com/Valve_Handbook_LowRes.pdf" title="">Valvue guide here</a>. And check out the <a href="http://news.ycombinator.com/item?id=3871463" title="Valve Employee Handbook | Hacker News">Hacker News discussion which ponders how well this scales</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/04/valves-new-employees-handbook-what-is-valve-not-good-at/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dummy Data, Drugs, and Check-lists</title>
		<link>http://danwin.com/2012/04/dummy-data-drugs-and-check-lists/</link>
		<comments>http://danwin.com/2012/04/dummy-data-drugs-and-check-lists/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 14:14:51 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1948</guid>
		<description><![CDATA[Using dummy data &#8212; and forgetting to remove it &#8212; is a pretty common and unfortunate occurrence in software development&#8230;and in journalism (check out this headline). If you haven&#8217;t made yourself a pre-publish/produce checklist that covers even the most basic&#8230; <a href="http://danwin.com/2012/04/dummy-data-drugs-and-check-lists/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img src="http://danwin.com/words/wp-content/uploads/2012/04/facepalm.jpg" alt="" title="facepalm" width="300" height="300" class="alignleft size-full wp-image-1956" />Using dummy data &#8212; and forgetting to remove it &#8212; is a pretty common and unfortunate occurrence in software development&#8230;and in journalism (<a href="http://jimromenesko.com/2012/02/02/suffolk-journal-apologizes-for-profane-headline/">check out this headline</a>). If you haven&#8217;t made yourself a pre-publish/produce checklist that covers even the most basic things (&#8220;do a full-text search for George Carlin&#8217;s <a href="http://en.wikipedia.org/wiki/Seven_dirty_words">seven dirty words</a>&#8220;), now&#8217;s a good time to start.</p>
<p>These catastrophic mistakes can happen even when billions of dollars are on the line.</p>
<p>In his book, <a href="http://www.amazon.com/New-Drugs-Insiders-Scientists-Investors/dp/141969961X"">New Drugs: An Insider&#8217;s Guide to the FDA&#8217;s New Drug Approval Process</a>, author Lawrence Friedhoff says he&#8217;s seen this kind of thing happen &#8220;many&#8221; times in the the drug research phase. He describes an incident in which his company was awaiting the statistical review of two major studies that would determine if a drug could move into the approval phase:</p>
<blockquote><p>The computer programs to do the analysis were all written. To be sure that the programs worked properly, <strong>the statisticians had tested the programs by making up treatment assignments for each patient without regard to what the patients had actually received, and verifying that the programs worked properly with these “dummy” treatment codes.</strong></p>
<p>&#8230;</p>
<p>The statisticians told us it would take about half an hour to do the analysis of each study and that they would be done sequentially. We waited in a conference room. The air was electric. Tens of millions of dollars of investment hung in the balance. The treatment options of millions of patients would or would not be expanded. Perhaps, billions of dollars of revenue and the future reputation of the development team would be made or lost based on the results of the statistical analyses.</p>
<p>The minutes ticked by. About 20 minutes after we authorized the code break, the door opened and the statisticians walked in. I knew immediately by the looks on their faces that the news was good&#8230;One down, one to go (since both studies had to be successful to support marketing approval).</p>
<p>The statisticians left the room to analyze the second study&#8230;which could still theoretically fail just by chance, so nothing was guaranteed. Finally, after 45 minutes, the door swung open, and the statisticians walked in. I could tell by the looks on their faces that there was a problem. They started presenting the results of the second study, trying to put a good face on a devastatingly bad outcome. The drug looked a little better than control here but worse there… I couldn’t believe it. <strong>How could one study have worked so well and the other be a complete disaster?</strong> The people in the room later told me I looked so horrified that they thought I would just have a heart attack and die on the spot.</p>
<p>The positive results of the first study were very strong, making it exceedingly unlikely that they occurred because of a mistake, and there was no scientific reason why the two studies should have given such disparate results. </p>
<p>After about a minute, I decided it was not possible for the studies to be so inconsistent, and that the statisticians must have made a mistake with the analysis of the second study&#8230;Ultimately they said they would check again, but I knew by their tone of voice that they viewed me with pity, a clinical development person who just couldn’t accept the reality of his failure. </p>
<p>An eternity later, the statisticians re-entered the room with hangdog looks on their faces. They had used the “dummy” treatment randomization for the analysis of the second study.<strong> The one they used had been made up to test the analysis programs, and had nothing to do with the actual treatments the patients had received during the study.</strong></p>
<p><em>From: Friedhoff, Lawrence T. (2009-06-04). <a href="http://www.amazon.com/New-Drugs-Insiders-Scientists-Investors/dp/141969961X">New Drugs: An Insider&#8217;s Guide to the FDA&#8217;s New Drug Approval Process for Scientists, Investors and Patients</a> (Kindle Locations 2112-2118). PSPG Publishing. Kindle Edition. </em></p></blockquote>
<p>So basically, Friedhoff&#8217;s team did the equivalent of what a newspaper does when laying out the page before the articles have been written: put in some filler text to be replaced later. Except that the filler text doesn&#8217;t get replaced at publication time&#8230;again, see this Romenesko link to see the <a href="http://jimromenesko.com/2012/02/02/suffolk-journal-apologizes-for-profane-headline/">disastrous/hilarious results</a>.</p>
<p>Here&#8217;s an example of it <a href="http://blog.fetchnotes.com/post/17155558880/what-happens-when-you-swear-at-your-users">happening in the tech startup world</a>.</p>
<p>What&#8217;s interesting about Friedhoff&#8217;s case, though, is that validation of study results is a relatively rare &ndash; and expensive occurrence&#8230;whereas publishing a newspaper happens every day, as does pushing out code and test emails. But Friedhoff says the described incident is &#8220;only one of many similar ones I could write about&#8221;&#8230;which goes to show that rarity and magnitude of a task won&#8217;t stop you from making easy-to-prevent, yet devastating mistakes.</p>
<p>Relevant: Atul Gawande&#8217;s <a href="http://www.newyorker.com/reporting/2007/12/10/071210fa_fact_gawande">article about the check-list</a>: how a simple list of five steps, as basic as &#8220;Wash your hands&#8221;, prevented thousands of surgical disasters.</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/04/dummy-data-drugs-and-check-lists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ProPublica at Netexplo</title>
		<link>http://danwin.com/2012/04/propublica-at-netexplo/</link>
		<comments>http://danwin.com/2012/04/propublica-at-netexplo/#comments</comments>
		<pubDate>Wed, 04 Apr 2012 12:51:22 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>
		<category><![CDATA[visuals]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1950</guid>
		<description><![CDATA[A few weeks ago, I had the honor of joining my colleagues Charlie Ornstein and Tracy Weber in Paris to receive a Netexplo award for our work with Dollars for Docs. Check out the presentation video they prepared for the&#8230; <a href="http://danwin.com/2012/04/propublica-at-netexplo/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago, I had the honor of joining my colleagues Charlie Ornstein and Tracy Weber in Paris to <a href="http://en.www.netexplo.org/palmares/2012">receive a Netexplo award</a> for our work with Dollars for Docs. Check out the presentation video they prepared for the awards ceremony (held at UNESCO), <a href="http://en.www.netexplo.org/laureat/propublica-dollars-for-docs">featuring us as bobbleheads</a>.</p>
<p>The easiest way to explain Netexplo is that one of the organizers told me that it hopes to be a South by Southwest of Paris. Check out the quirky trophy we got:</p>
<div class="photo">
<img src="https://p.twimg.com/AoBvjj3CMAADMX1.jpg" alt="Netexplo trophy" />
</div>
<p>Check out the <a href="http://en.www.netexplo.org/palmares/2012">other great entries in this year&#8217;s ceremony</a>.</p>
<p>This was my first trip to Paris so of course I took photos like a shutterbug tourist. You can <a href="http://www.flickr.com/photos/zokuga/sets/72157629628108473/">view them on my Flickr account</a>:</p>
<p><a href="http://www.flickr.com/photos/zokuga/6862741170/" title="Sony Alpha NEX-7: Paris - Eiffel Tower by Dan Nguyen @ New York City, on Flickr"><img style="width: 100%" src="http://farm8.staticflickr.com/7053/6862741170_4c9ec18389_b.jpg"  alt="Sony Alpha NEX-7: Paris - Eiffel Tower"></a></p>
<p><a href="http://www.flickr.com/photos/zokuga/6871602718/" title="Centre Pompidou, Musée National d'Art Moderne by Dan Nguyen @ New York City, on Flickr"><img  style="width: 100%"  src="http://farm8.staticflickr.com/7066/6871602718_4e8825cfa9_b.jpg" alt="Centre Pompidou, Musée National d'Art Moderne"></a></p>
<p><a href="http://www.flickr.com/photos/zokuga/7008890555/" title="Tuileries Garden by Dan Nguyen @ New York City, on Flickr"><img  style="width: 100%" src="http://farm8.staticflickr.com/7074/7008890555_bc9aedf270_b.jpg"  alt="Tuileries Garden"></a></p>
<p><a href="http://www.flickr.com/photos/zokuga/7002886887/" title="The Eiffel Tower, as seen from the Trocadéro. by Dan Nguyen @ New York City, on Flickr"><img style="width: 100%"  src="http://farm8.staticflickr.com/7088/7002886887_0eebdaeb9c_b.jpg" alt="The Eiffel Tower, as seen from the Trocadéro."></a></p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/04/propublica-at-netexplo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Because of a typo, the government needs to keep your private data 10 times longer?</title>
		<link>http://danwin.com/2012/03/because-of-a-typo-the-government-needs-to-keep-your-private-data-10-times-longer/</link>
		<comments>http://danwin.com/2012/03/because-of-a-typo-the-government-needs-to-keep-your-private-data-10-times-longer/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 16:47:33 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>
		<category><![CDATA[data]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1938</guid>
		<description><![CDATA[Yesterday the Obama administration approved new rules to greatly extend the time &#8211; from 180 days to 1,826 days (5 years) &#8211; that domestic intelligence services can retain American citizens&#8217; private information. Citizens are eligible to be part of this&#8230; <a href="http://danwin.com/2012/03/because-of-a-typo-the-government-needs-to-keep-your-private-data-10-times-longer/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Yesterday the Obama administration <a href="http://www.nytimes.com/2012/03/23/us/politics/us-moves-to-relax-some-restrictions-for-counterterrorism-analysis.html" title="U.S. Relaxes Some Restrictions for Counterterrorism Analysis - NYTimes.com">approved new rules to greatly extend the time</a> &ndash; from 180 days to 1,826 days (5 years) &ndash; that domestic intelligence services can retain American citizens&#8217; private information. Citizens are eligible to be part of this federal data warehouse even when &#8220;there is no suspicion that they are tied to terrorism.&#8221;
</p>
<p>As <a href="http://www.nytimes.com/2012/03/23/us/politics/us-moves-to-relax-some-restrictions-for-counterterrorism-analysis.html?_r=1&amp;ref=politics" title="U.S. Relaxes Some Restrictions for Counterterrorism Analysis - NYTimes.com">Charlie Savage in the New York Times reports</a>:
</p>
<blockquote><p>Intelligence officials on Thursday said the new rules have been under development for about 18 months, and grew out of reviews launched after the failure to connect the dots about Umar Farouk Abdulmutallab, the “underwear bomber,” before his Dec. 25, 2009, attempt to bomb a Detroit-bound airliner.
</p>
<p>After the failed attack, government agencies discovered they had intercepted communications by Al Qaeda in the Arabian Peninsula and received a report from a United States Consulate in Nigeria that could have identified the attacker, if the information had been compiled ahead of time.</p>
</blockquote>
<p>The case of the &#8220;underwear bomber&#8221; is a strange justification for this expansion of data storage. Because the 2009 Christmas terror attempt nearly succeeded thanks to a series of what seems like common human errors, not from an information drought.
</p>
<p>Shortly after the underwear bomber incident, the White House released a report examining how our vast intelligence network failed to prevent Abdulmutallab, the bomber, from boarding a flight from Amsterdam to Detroit.
</p>
<p>One of the critical failures? Someone at the State Department, when sending information about Abdulmutallab to the National Counterterrorism Center, <strong>misspelled his name</strong>. Even though his father alerted American intelligence officials a full month before the attempted attack, our sophisticated surveillance system was partially stymied by a single misplaced letter.
</p>
<p>As <a href="http://thecable.foreignpolicy.com/posts/2010/01/08/more_on_misspelling_the_underwear_bomber_s_name" title="More on misspelling the underwear bomber’s name | The Cable">Foreign Policy reported in 2010</a>:</p>
<blockquote>
<p>State called an impromptu press briefing late Thursday evening to address the issue. The tone of the briefing was combative, as reporters pressed the &#8220;senior administration official&#8221; for details about the misspelling that he seemed not to want to give up. But here&#8217;s what we learned.
</p>
<p>Someone (they won&#8217;t say who) at the State Department (presumably at the U.S. Embassy in Nigeria) did check to see if Abdulmutallab had a visa (they won&#8217;t say exactly when). That person was working off the Visas Viper cable originally sent from the embassy to the NCTC, which had the name wrong.
</p>
<p>&#8220;There was a dropped letter in that &#8212; there was a misspelling,&#8221; the official said. &#8220;They checked the system. It didn&#8217;t come back positive. And so for a while, no one knew that this person had a visa.&#8221; (They won&#8217;t say for how long)
</p>
</blockquote>
<p>The chain of failures <a href="http://thecable.foreignpolicy.com/posts/2010/01/07/how_much_did_misspelling_abdulmutallabs_name_matter" title="How much did misspelling Abdulmutallab&#039;s name matter? | The Cable">is more complicated than that</a>, but the fact that a typo was a big enough of a wrench to <a href="http://www.whitehouse.gov/sites/default/files/summary_of_wh_review_12-25-09.pdf" title="">warrant special mention in the White House review</a> is an indication that the government&#8217;s surveillance systems, despite the work of its data architects, engineers and scientists, were compromised by some pretty banal problems, like not having spell-check capability.</p>
<p>In fact, the White House report goes out of its way to assert that the information-sharing problems that failed to prevent the 9/11 attacks &#8220;have now, 8 years later, <a href="http://www.whitehouse.gov/sites/default/files/summary_of_wh_review_12-25-09.pdf" title="">largely been overcome</a>.&#8221; Information about Abdulmutallab (again, <em>his own father</em> met with U.S. officials to warn them of his son a month ahead of the attack), his association with Al Qaeda, and Al Qaeda&#8217;s attack planning, &#8220;was available to all-source analysts at the CIA and the NCTC prior to the attempted attack.&#8221;</p>
<p>In other words, the 9/11 attack was possible because government agencies wouldn&#8217;t share information with each other. Now, they are happily sharing information with each other, they just aren&#8217;t diligently looking at it.</p>
<p>So the best solution is to enact a ten-fold increase the legal time limit for storing American citizens&#8217; data?</p>
<p>It sounds like the government&#8217;s ability to detect terrorists would be greatly improved with better user-friendly software and adherence to data-handling standards. The ability to catch slight misspellings and do fuzzy data matches is something that Facebook and Google users have enjoyed for years; hell, the basic concept and consumer-friendly implementation has been around <a href="http://en.wikipedia.org/wiki/Microsoft_Word" title="Microsoft Word - Wikipedia, the free encyclopedia">in Microsoft Word since about 20 years ago</a>. Have software overhauls been enacted before deciding that the government needs more of its citizens&#8217; private information? Or does the review of such technical details and policies seem too unsexy and pedantic for our intelligence bureaucracy?</p>
<p>The Times <a href="http://www.nytimes.com/2012/03/23/us/politics/us-moves-to-relax-some-restrictions-for-counterterrorism-analysis.html?_r=1&amp;ref=politics" title="U.S. Relaxes Some Restrictions for Counterterrorism Analysis - NYTimes.com">article also mentions</a> that the guidelines call for more duplication of entire databases&#8230;which is a bit confusing. I&#8217;m assuming that this doesn&#8217;t refer to making backup copies (in case of a hard drive failure), but to a method of data-sharing between analysts. This is how the Times describes it:</p>
<blockquote>
<p>   The guidelines are also expected to result in the center making more copies of entire databases and “data mining them” using complex algorithms to search for patterns that could indicate a threat.
</p>
</blockquote>
<p>Hopefully, this doesn&#8217;t mean that database files are being copied and passed around so that each department can have their own copy of another department&#8217;s data. This would seem to introduce a few major logistical issues: namely, how do you know the copy you have contains the latest data? Remember that the typo in Abdulmutallab&#8217;s name was one mistake that helped spawn a series of snafus. Are we going to have an incident in which a terrorist slips through because an analyst forgot to update his/her copy of a database before mining it? Also, there&#8217;s the possibility that some of these data copies might end up lying around long after their 5-year limit. </p>
<p>There have been several reports of how intelligence agencies now suffer from <em>too much data</em>, to the point where analysts are &#8220;<a href="http://www.npr.org/2012/01/11/144322791/why-americas-spies-struggle-to-keep-up" title="Why America's Spies Struggle To Keep Up : NPR">drowning in the data</a>.&#8221; If this is a reason cited for how an attack went unprevented in the future, I hope the proposed reform is not &#8220;more data.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/03/because-of-a-typo-the-government-needs-to-keep-your-private-data-10-times-longer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tools to get to the precipice of programming</title>
		<link>http://danwin.com/2012/03/tools-to-get-to-the-precipice-of-programming/</link>
		<comments>http://danwin.com/2012/03/tools-to-get-to-the-precipice-of-programming/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 15:48:33 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>
		<category><![CDATA[works]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1928</guid>
		<description><![CDATA[I&#8217;m not a master programmer but it&#8217;s been so long since I&#8217;ve done my first &#8220;Hello World&#8221; that I don&#8217;t remember how people first grok the point of programming (for me, it was to get a good grade in programming&#8230; <a href="http://danwin.com/2012/03/tools-to-get-to-the-precipice-of-programming/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not a master programmer but it&#8217;s been so long since I&#8217;ve done my first &#8220;Hello World&#8221; that I don&#8217;t remember how people first grok the point of programming (for me, it was to get a good grade in programming class).</p>
<p>So when teaching non-programmers the value of code, I&#8217;m hoping there&#8217;s an even friendlier, shallower first step than the many zero-to-coder references out there, including Zed Shaw&#8217;s excellent <a href="http://learncodethehardway.org/">Learn Code the Hard Way series</a>. </p>
<p>Not only should this first step be &#8220;easy&#8221;, but nearly ubiquitous, free-to-use, and most importantly: has immediate benefit for both beginners and experts. The point here is not to teach coding, per se, but to get them to a <em>precipice</em> of great things. So that when they stand at the edge, they can at least see something to program towards, even if the end goal is simply labor-aversion, i.e. &#8220;I don&#8217;t want to copy-and-paste 100 web page tables by hand.&#8221;</p>
<p>Here are a few tools I&#8217;ve tried:</p>
<p><img alt="Inspecting a cat photo" src="http://danwin-files.s3.amazonaws.com/nicar/scrape/screenshot-cathair.jpg" title="Inspecting a cat photo" class="aligncenter" width="650" height="609" /></p>
<p><strong>1. Using the web inspector</strong> &ndash; I&#8217;ve never seen the point of taking an indepth HTML class (unless you want to become a full-time web designer/developer, and even then&#8230;) because so many non-techies even grasp that webpages are (largely) text, external multimedia assets (such as photos and videos), and the text that describes where those assets come from. To them, editing a webpage is as arcane as compiling a binary.</p>
<p>Nothing breaks that illusion better than the web inspector. Its basic element-inspector and network panel illustrates immediately the &#8220;magic&#8221; behind the web. As a bonus, with regular, casual use, the inspector can teach you the HTML and CSS vocabulary if you do intend to be a developer. It&#8217;s hard to think of another tool that is as ubiquitous and easy to use as the web inspector, yet as immensely useful to beginner and expert alike.</p>
<p>Its uses are immediate, especially for anyone who&#8217;s ever wanted to download a video from YouTube. To journalists, I&#8217;ve taught how this simple-to-use tool <a href="http://www.propublica.org/nerds/item/reading-flash-data">has helped me in my investigative reporting</a> when I needed to find an XML file that was obfuscated through a Flash object. </p>
<p>In a hands-on class I taught, a student asked &#8220;So how do I get that XML into Excel?&#8221; &ndash; and that&#8217;s when you can begin to describe the joy of a basic <strong>for</strong> loop.</p>
<p>Here&#8217;s an <a href="http://dannguyen.github.com/NICAR/2012/02/25/nicar-2012-inspect-the-web-with-your-browsers-web-inspector/">overview of a hands-on web session</a> I taught at NICAR12. Here&#8217;s the <a href="http://www.propublica.org/nerds/item/reading-flash-data">guide I wrote for my ProPublica project</a>. And here&#8217;s the <a href="http://ruby.bastardsbook.com/chapters/web-inspecting-html/">first of a multi-part introduction to the web inspector</a>.</p>
<p><img alt="Refine WH Visitors" src="http://dannguyen.github.com/NICAR-Google-Refine/images/grefine-wh-visitors.png" title="Refine WH Visitors" class="aligncenter" width="600" height="335" /></p>
<p><strong>2. Google Refine</strong> &ndash; Refine is a spreadsheet-like software that allows you to easily explore and clean data: the most common example is resolving varied entries (&#8220;JOHN F KENNEDY&#8221;, &#8220;John F. Kennedy&#8221;, &#8220;Jack Kennedy&#8221;, &#8220;John Fitzgerald Kennedy&#8221;) into one (&#8220;John F. Kennedy&#8221;). Given that so many great investigative stories and data projects start with &#8220;How many times does this person&#8217;s name appear in this messy database?&#8221;, its uses are immediate and obvious. </p>
<p>Refine is an open-source tool that works out of the web browser and yet is such a powerful point-and-click interface that I&#8217;m happy to take my data out of my scripted workflow in order to use Refine&#8217;s features on it. Not only can you use regular expressions to help filter/clean your data, you can write full-on scripts, making Refine a pretty good environment to show some basic concepts of code (such as variables and functions).</p>
<p>I wrote a guide showing how <a href="http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning">Refine was essential for one of my investigative data projects</a>. Refine&#8217;s official <a href="http://code.google.com/p/google-refine/">video tutorial is also a great place to start</a>.</p>
<p>3. <strong>Regular Expressions</strong> &ndash; maybe it was because my own comsci curriculum skipped regexes, leaving me to figure out their worth much much later than I should have. But I really try to push learning regexes every time the following questions are asked:</p>
<ul>
<li> In Excel, how do I split this &#8220;last_name, first_name middle_name&#8221; column into three different columns?</li>
<li> In Excel, how do I get all these date formats to be the same?</li>
<li> In Excel, how do I extract the zip code from this address field?</li>
</ul>
<p>&#8230;and so on. The use of LEFT, TRIM, RIGHT, etc. functions seem to always be much more convoluted than the regex needed to do this kind of simple parsing. And while regexes aren&#8217;t the answer to every parsing problem, they sure deliver a lot of return for the investment (which can start from a simple cheat sheet next to your computer).</p>
<p><a href="http://www.regular-expressions.info/">Regular-expressions.info</a> has always been one of my favorite references. Zed Shaw is also <a href="http://regex.learncodethehardway.org/book/">writing a book on regexes</a>. I&#8217;ve also <a href="http://ruby.bastardsbook.com/chapters/regexes/">written a lengthy tutorial on regexes</a>.</p>
<p>&#8211;</p>
<p>So none of these tools or concepts involve programming&#8230;yet. But they&#8217;re immediately useful on their own, opening new doors to <i>useful</i> data just enough to interest beginners into going further. In that sense, I think these tools make for an inviting introduction towards learning programming.</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/03/tools-to-get-to-the-precipice-of-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Code, Don&#8217;t Tell: Programming as an Essential Journalism Skill</title>
		<link>http://danwin.com/2012/02/code-dont-tell-programming-as-an-essential-journalism-skill/</link>
		<comments>http://danwin.com/2012/02/code-dont-tell-programming-as-an-essential-journalism-skill/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 19:56:35 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[thoughts]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1913</guid>
		<description><![CDATA[(tl;dr: this started out as a short post about how all of journalism can benefit from learning to code. It is now a massive rant that maybe I&#8217;ll split up later. It covers: A quote by Seymour Hersh Two of&#8230; <a href="http://danwin.com/2012/02/code-dont-tell-programming-as-an-essential-journalism-skill/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>(<a href="http://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn't_read" title="Wikipedia:Too long; didn't read - Wikipedia, the free encyclopedia">tl;dr:</a> this started out as a short post about how all of journalism can benefit from learning to code. It is now a massive rant that maybe I&#8217;ll split up later. It covers: </p>
<ul>
<li>A quote by <a href="#hersh">Seymour Hersh</a></li>
<li>Two of my own projects as case studies:</li>
<ul>
<li>		 <a href="#sopaopera">SOPA Opera:</a> Using programming to create greater transparency on a single political issue</li>
<li>		 <a href="#d4d">Dollars for Docs:</a> Using programming to drive a nationwide investigation</li>
</ul>
<li>	How new, important stories <a href="#old">can come from &#8220;old&#8221; ones</a></li>
<li>A <a href="#roadmap">practical roadmap for non-programmers</a> on where to start, with a list of resources and things to download</li>
<li>A short list of <a href="#non-programmers">inspirational ex-non-programmers</a></li>
</ul>
<p>
This post is inspired by a recent discussion on the <a href="http://www.ire.org/nicar/" title="Investigative Reporters and Editors | NICAR">NICAR (National Institute of Computer Assisted Reporting) mailing list</a>, in which a journalism professor asked how her students should position themselves for a newspaper&#8217;s web developer job. The answer I suggested was: have them learn programming and have them  publish projects online, on their own, that they can later show an employer.
</p>
<p>But I&#8217;m becoming more convinced that programming &ndash; a decent grasp of it, not make-the-next-Facebook level &ndash; is an essential skill for <em>all</em> journalists, even ones that never intend to produce a webpage in their career. And for students, or any aspiring journalist, I think I can make the case <strong>that programming is absolutely the most important skill to learn in school</strong> (along with honing your interviewing, research and writing skills at the school paper/radio/TV station) if you want to improve your chances for a serious journalism career.</p>
<p><a id="hersh">#</a></p>
<h2>Hersh and Bamford</h2>
<p>A few years ago, I attended a panel on investigative reporting that featured <a href="http://en.wikipedia.org/wiki/Seymour_Hersh" title="Seymour Hersh - Wikipedia, the free encyclopedia">Seymour Hersh</a> &#8211; the Pulitzer Prize-winning reporter who exposed the <a href="http://en.wikipedia.org/wiki/My_Lai_Massacre" title="My Lai Massacre - Wikipedia, the free encyclopedia">My Lai massacre</a> &ndash; and <a href="http://en.wikipedia.org/wiki/James_Bamford" title="James Bamford - Wikipedia, the free encyclopedia">James Bamford</a>, a former Navy intelligence analyst who is well-respected for writing books that managed to penetrate the workings of even the super-secret <a href="http://en.wikipedia.org/wiki/National_Security_Agency" title="National Security Agency - Wikipedia, the free encyclopedia">NSA</a> (affectionately known as the No Such Agency).
</p>
<div class="wp-caption alignright" style="width: 243px"><img alt="" src="http://www.newyorker.com/images/contributors/p233/contributor_seymourmhershphoto_p233_crop.jpg" title="Seymour Hersh" width="233" height="256" /><p class="wp-caption-text">Seymour Hersh</p></div>
<p>The discussion turned to the use of the <a href="http://en.wikipedia.org/wiki/Freedom_of_Information_Act_%28United_States%29" title="Freedom of Information Act (United States) - Wikipedia, the free encyclopedia">Freedom of Information Act</a>, a law that reporters wield to get sensitive, unpublished documents from the federal government. Given that the NSA isn&#8217;t known for being chatty, Bamford explained how his stories were put together through exhaustive uses of FOIA.
</p>
<p>When an audience member asked Hersh how often he used FOIA, his response &ndash; and I&#8217;m quoting from memory here &ndash; was:</p>
<p> <strong>&#8220;Why the fuck would I FOIA documents?&#8221;</strong></p>
<p>I can&#8217;t read Hersh&#8217;s mind, but I&#8217;m guessing that he wasn&#8217;t wholesale dismissing the importance of FOIA, which has been essential in countless investigative stories.</p>
<p>He probably meant that: He&#8217;s <strong>Seymour Hersh</strong>. He exposed the My Lai massacre. He&#8217;s a <a href="http://www.newyorker.com/magazine/bios/seymour_m_hersh/search?contributorName=seymour%20m%20hersh" title="Search : The New Yorker">regular contributor for the New Yorker</a>. The kind of stories he writes for the New Yorker involves the type of people who wouldn&#8217;t be caught dead making a statement that would ever be reprinted on a document subject to a FOIA request.</p>
<p>And even if they were FOIAble, those requests take time (sometimes many years) and involve countless lawyers and legal wrangling. In the meantime, he&#8217;s up to his eyeballs in secret officials who, for some reason or other, are eager to spill their secrets to him, because he&#8217;s Seymour Hersh.</p>
<p>So, why the fuck <em>would</em> he FOIA documents?</p>
<p>Bamford doesn&#8217;t have quite that brand power and his targets are likely more reticent. But he&#8217;s learned &ndash; possibly through his Navy days &ndash; that there are plenty of important secrets in the stacks of documents that have been deemed fit for public consumption. It&#8217;s not always obvious, even among intelligence officials, to see how a mass of innocuous information inadvertently reveals big secrets.</p>
<p>So, back to the subject of aspiring journalists: to them, <strong>the already-employed journalists are the Seymour Hershes</strong>. These journalists have established themselves and their beat, which they can focus full-time on because they&#8217;re earning a salary to do so. Their phone contains the cell numbers of all the important officials who won&#8217;t ignore their 9 p.m. Sunday night call. When they write something, a large number of trees, barrels of ink, and/or corporate-purchased bandwidth are readily expended to make it known.</p>
<p>On the other end of the spectrum, the aspiring journalists are the Bamfords. Between work shifts, they have the same right as Joe Public to attend meetings and leave inquiries with contact@yourcity.gov. But for them, even the local police department might as well be the NSA. They aren&#8217;t going to be privy to hush-hush phone calls or let past the murder scene tape.</p>
<p>If this is your current vantage point, <em>even if you intend to be a Hersh-style reporter</em>, you&#8217;re going to have to Bamford your way into a field that has an increasing amount of noise and a corresponding shrinkage in paid, established positions.</p>
<p>Given this situation, <strong>I can think of no better strategy than to learn programming.</strong> This is a skill that not only makes more efficient every other journalism skill (writing, researching, publishing) but can, like Bamford&#8217;s relentless FOIAs, reveal stories that non-programming journalists will never be able to do, and in an unfortunate number of cases, even conceive of.</p>
<h3>Learning the Hard Way</h3>
<p>Zed Shaw, who isn&#8217;t a journalist but is renowned for both his code contributions and his widely-read (and free!) how-to-program books, <a href="http://learnpythonthehardway.org/book/advice.html" title="Advice From An Old Programmer &mdash; Learn Python The Hard Way, 2nd Edition">puts it this way</a>:</p>
<blockquote><p>&#8220;Programming as a profession is only moderately interesting. It can be a good job, but you could make about the same money and be happier running a fast food joint.<strong> You&#8217;re much better off using code as your secret weapon in another profession</strong>.
</p>
<p>
   People who can code in the world of technology companies are a dime a dozen and get no respect. People who can code in biology, medicine, government, sociology, physics, history, and mathematics are respected and can do amazing things to advance those disciplines.
</p>
</blockquote>
<p>Note that he doesn&#8217;t have many romantic notions about  programming as a <em>profession</em>. However, programming is something bigger than just a job: it&#8217;s an essential, game-changing skill.</p>
<h2>Code, don&#8217;t tell</h2>
<p>&#8220;<a href="http://en.wikipedia.org/wiki/Show,_don't_tell" title="Show, don't tell - Wikipedia, the free encyclopedia">Show, don&#8217;t tell</a>&#8221; is how my high school journalism teacher taught us how to write. Instead of <em>telling</em> the reader something:</p>
<p><em>James Smith is one of the toughest football players on the team.</em></p>
<p>&ndash; <em>show</em> it, through observed evidence:</p>
<p><em>When the halftime whistle blew, James Smith walked to the sidelines and collapsed. He later was told that the neck pain he played with through the second quarter was caused by a fracture in his neck.</em>
</p>
<p>I guess &#8220;Code, don&#8217;t tell&#8221; doesn&#8217;t really make sense; it&#8217;s just my made-up-way of saying that we have fantastically more ways than ever to <strong>tell</strong> &ndash; blog posts, retweets, status updates, auto-aggregations and other forms of repurposing &ndash; but we&#8217;re little better equipped to find and develop the actual stories. Programming is a skill that cuts through the noise, allows for the analysis and reporting on new substantive information sources, and even provides a way to create innovative story-telling forms (i.e. the web developer&#8217;s role).</p>
<p>
	So to follow my high school teacher&#8217;s advice, here&#8217;s an overview of my two most successful journalism projects so far, both done at <a href="http://www.propublica.org">ProPublica</a>. As I explain later, both more or less originated from me sitting on my couch, being annoyed by what I saw as a lack of transparency. The first one, <a href="http://projects.propublica.org/sopa/" title="Who in Congress Supports SOPA and PIPA/PROTECT-IP? | SOPA Opera | ProPublica">SOPA Opera</a>, was initially self-published and probably could have been done entirely from the couch. The other, <a href="http://projects.propublica.org/docdollars/">Dollars for Docs</a>, was a full out effort by my colleagues and me. But it was programming-driven at every phase.
</p>
<p><a id="sopaopera">#</a></p>
<h3>SOPA Opera</h3>
<p>I won&#8217;t rehash the debate over this <a href="http://en.wikipedia.org/wiki/Stop_Online_Piracy_Act" title="Stop Online Piracy Act - Wikipedia, the free encyclopedia">now-dead Internet regulation law</a>, but the inspiration for the SOPA Opera news app was simply: I had read plenty of debate about SOPA for months. But when I wanted to see just which legislators actually supported it and their reasons for doing so, there wasn&#8217;t yet a great resource for that.</p>
<p>If you know about the official legislative site, <a href="http://thomas.loc.gov/home/thomas.php" title="THOMAS (Library of Congress)">THOMAS</a> and are familiar with its navigation, you could at least find the list of sponsors. But good luck trawling the Congressional committee sites to find transcripts and testimony related to the law. It goes without saying that a list of opponents doesn&#8217;t exist and is beyond the official scope of THOMAS anyway.</p>
<p>So <a href="http://projects.propublica.org/sopa/" title="Who in Congress Supports SOPA and PIPA/PROTECT-IP? | SOPA Opera | ProPublica">SOPA Opera</a>, boiled down, is an pretty pedantic concept: &#8220;<em>Hey, here&#8217;s a list of Congressmembers and what I&#8217;ve found out so far about their positions on SOPA.</em>&#8220;</p>
<p>In other words, changing this:</p>
<div id="attachment_1916" class="wp-caption aligncenter" style="width: 561px"><img src="http://danwin.com/words/wp-content/uploads/2012/02/thomas.png" alt="" title="SOPA sponsors, on THOMAS" width="551" height="483" class="size-full wp-image-1916" /><p class="wp-caption-text">SOPA sponsors, on THOMAS</p></div>
<p>To this:</p>
<p><img src="http://danwin.com/words/wp-content/uploads/2012/02/sopa-image-640x460.png" alt="" title="sopa-image" width="640" height="460" class="aligncenter size-large wp-image-1917" /></p>
<p>The gist of <a href="http://projects.propublica.org/sopa/" title="Who in Congress Supports SOPA and PIPA/PROTECT-IP? | SOPA Opera | ProPublica">SOPA Opera</a> could be done without any programming whatsoever. You could even build a static in Photoshop and upload the image onto the Internet. So what role did programming play in this? It made it very easy to gather the already-available information, which included: the official list of sponsors, the boilerplate biographical and district information on every Congressmember (including their mug shots), and <a href="http://www.opensecrets.org/" title="OpenSecrets.org: Money in Politics -- See Who's Giving &#038; Who's Getting">contribution data from the Center for Responsive Politics</a>.</p>
<p><strong>No exaggeration:</strong> a decent programmer could build a nice site from this data in about half-an-hour. The jazzy part of the site &ndash; the dynamic sorting of the list &ndash; was already built and offered as a free plugin to use (courtesy <a href="http://desandro.com/" title="David DeSandro">David DeSandro</a>). It&#8217;s entirely possible to create SOPA Opera by hand, given a few days and an infinite amount of patience.
	</p>
<p>So programming allowed me to save my time and energy for the actual reporting. I thought about building a scraper to go through each Congressmember&#8217;s Facebook and Twitter page to search for the term &#8220;SOPA&#8221; But until the blackout, most lawmakers had nothing to say on the topic. So at first, my &#8220;research&#8221; largely involved typing &#8220;SOPA [some congressmember's name]&#8221; into Google News and usually finding nothing.</p>
<p>When <a href="http://www.theverge.com/2012/1/18/2715300/sopa-blackout-wikipedia-reddit-mozilla-google-protest" title="The SOPA blackout: Wikipedia, Reddit, Mozilla, Google, and many others protest proposed law | The Verge">the SOPA issue blew up during the Jan. 18 blackout</a>, I didn&#8217;t have to do any searching, as Congressmembers pretty much rushed to make known their opposition to SOPA. <a href="http://projects.propublica.org/sopa/" title="Who in Congress Supports SOPA and PIPA/PROTECT-IP? | SOPA Opera | ProPublica">SOPA Opera</a> was designed in a way to make it easy for constituents to look up their representative and, if I had no information about him/her, tell me what they found out after talking to their representative. Or, in a few cases, Congressional staffers contacted me directly.</p>
<p><a href="http://projects.propublica.org/sopa/" title="Who in Congress Supports SOPA and PIPA/PROTECT-IP? | SOPA Opera | ProPublica">SOPA Opera</a> easily broke the single-day traffic record at ProPublica. This was mostly due to blackout-participating sites like Craigslist that directed their traffic to us as a reference. Clearly, what caused the seismic shift on the SOPA debate were the mega-sites that coordinated the millions of emails and phone calls to lawmakers, and SOPA Opera was an indirect beneficiary of the increased public interest.
	</p>
<p><strong>But I believe that SOPA Opera made at least one important contribution to the debate:</strong> it made very clear the level and characteristics of support enjoyed by SOPA. One thing that <a href="http://thomas.loc.gov/cgi-bin/bdquery/z?d112:HR03261:@@@P" title="Bill Summary Status  -  112th Congress (2011 - 2012)  - H.R.3261 - Cosponsors - THOMAS (Library of Congress)">the THOMAS listing of sponsors fails to do</a> is note the political parties of the lawmakers. I felt that this was a critical piece and it was easy to get and display. The result: many visitors to SOPA Opera who had believed that SOPA was a diabolical scheme by [whatever-party-they-oppose] were shocked at how SOPA&#8217;s support was so bi-partisan and broad.</p>
<p>I heard from a number of people who had been highly energized about the anti-SOPA debate yet were completely <a href="http://projects.propublica.org/sopa/F000457" title="Sen. Al Franken's position on PIPA (S.968) | Who in Congress Supports SOPA and PIPA/PROTECT-IP? | SOPA Opera | ProPublica">shocked that Sen. Al Franken</a> &ndash; automatically assumed to be on the side of &#8220;Internet freedom&#8221; &ndash; was in fact, was a sponsor of SOPA&#8217;s counterpart in the Senate. This was no state secret: Sen. Franken has been passionate and outspoken in his support and <a href="http://blog.alfranken.com/2012/01/20/lets-talk-about-intellectual-property/" title="Let&#8217;s talk about intellectual property | Al Franken - U.S. Senator, Minnesota">was one of the few who didn&#8217;t back away after political support collapsed post-blackout</a>.</p>
<p>SOPA Opera&#8217;s success probably owes less to my skill than to the dismal state of accessibility in our legislative process. This goes to show that even when you arrive <em>extremely late</em> to the game, it&#8217;s possible to make a significant impact by simply having an idea of how things can be better. This applies in just about any situation and profession. Programming just makes it much easier to push your creation forward.</p>
<h4>Some technical details on self-publishing</h4>
<p>I don&#8217;t want to dwell too much on the web-side of things, as that is just one specific use of programming. But to get back on the topic of how someone can position themselves for a web-related media job, SOPA Opera is a really excellent example of the potential in self-publishing.</p>
<p>SOPA Opera spent about a week on a domain (sopaopera.org) that I purchased for $10. It didn&#8217;t bear the ProPublica brand then, and I didn&#8217;t have time to promote it beyond a few tweets and submitting it to Hacker News and Reddit.
</p>
<p>But in just a week, sopaopera.org had racked up about 150,000 pageviews before we migrated it to ProPublica:</p>
<p><img src="http://danwin.com/words/wp-content/uploads/2012/02/sopa-stats-640x354.png" alt="" title="sopa-stats" width="640" height="354" class="aligncenter size-large wp-image-1918" /></p>
<p>That&#8217;s not a huge number in itself, and traffic to it increased exponentially under ProPublica&#8217;s umbrella. But it had gained enough notice that prominent were linking to it. The problem I had at the start: Googling a random lawmaker&#8217;s name and the term &#8220;SOPA&#8221; and finding absolutely nothing &ndash; was solved, as Google highly-indexed all of the auto-generated lawmaker pages on SOPA Opera. At least one Congressmember&#8217;s office emailed me to update his page.</p>
<p>Not bad for a holiday break project and $10, using free resources and tools that are available to anyone with a computer. Back when I applied for newspaper jobs, I had to carry a portfolio of cut-out newspaper articles to show editors that at least a few people (my college paper and the one newspaper I interned at) had been willing to waste trees and ink on me. You were out of luck if all you had were a bunch of  links to blog posts.
	</p>
<p>The mindset is different today, articles published on a traditional publication&#8217;s website, or at an online-only organization like Huffington Post, can count as legit clips. But I&#8217;d like to think that showing a full-blown website that includes not only traditional reporting content, but examples of how visual and interface design can tell a new story, as well as being able to provide concrete metrics (pageviews, referring links) of impact, would be even more impressive to today&#8217;s news editors.</p>
<p><a id="d4d">#</a></p>
<p><img src="http://danwin.com/words/wp-content/uploads/2012/02/d4d-640x434.png" alt="" title="d4d" width="640" height="434" class="aligncenter size-large wp-image-1919" /></p>
<h3>Dollars for Docs</h3>
<p>Like SOPA Opera, <a href="http://projects.propublica.org/docdollars">Dollars for Docs</a> (aka &#8220;D4D&#8221;) is late to its respective topic. It is a long accepted practice for medical companies to pay doctors to promote their products, not much different from a notable athlete who endorses a shoe that she considers to be the best for her sport. But in recent years, lawmakers and regulators have called for more transparency of these financial ties to prevent cases in which a doctors are unduly influenced by their benefactors.</p>
<p>Data on company-to-doctor payments is at least two decades old: Minnesota enacted a law in 1993 requiring companies to disclose their payments. However, that &#8220;data&#8221;, which came in the form of paper records that had to be hand-entered into a computer &ndash; after, of course, you visited the records&#8217; actual storage location and photocopied each page at 25 cents a pop. For that reason, the records were collected but unexamined for at least a decade until Dr. Joseph Ross (now at Yale University) and the <a href="http://www.citizen.org/Page.aspx?pid=2306" title="About Public Citizen">Public Citizen advocacy group</a> collected and analyzed the records.</p>
<p>In 2007, they <a href="http://jama.ama-assn.org/content/297/11/1216.full" title="Pharmaceutical Company Payments to Physicians, March 21, 2007, Ross et al. 297 (11): 1216  —  JAMA">published their findings in the Journal of the American Medical Association</a>, with the conclusion that the payment records were “compromised by incomplete disclosure as well as insufficient access.”:  </p>
<ul>
<li>In Vermont (which enacted a public disclosure law in 2001), most disclosures were redacted for &#8220;trade secrets&#8221; reasons. Of the publicly disclosed payments, 75% of them lacked information identifying the recipient</li>
<li>In Minnesota, many of the companies had years in which they reported nothing.</li>
<li>The &#8220;public&#8221; disclosures were pretty much inaccessible to the public. Dr. Ross and Public Citizen had to go to court to get the Vermont records.</li>
</ul>
<p>Dr. Ross told me that after his study was published, not only was it apparent that the public was in the dark, but doctors themselves had no idea that the data were even being collected. The Minnesota pharmacy board was subsequently so swamped by requests from other researchers, hospitals and litigants that it began publishing the disclosures online.</p>
<p>At around the same time, the New York Times <a href="http://www.nytimes.com/2007/05/10/health/10psyche.html?pagewanted=all" title="Psychiatrists, Children and Drug Industry&#8217;s Role - New York Times">published its own investigation using the Minnesota records</a>. Their analyzed both the company disclosures and Medicaid payments to Minnesota psychiatrists and found that during a time period in which company payments to Minnesota psychiatrists increased “compromised by incomplete disclosure as well as insufficient access.”, antipsychotic drug prescriptions for children jumped more than <strong>900 percent</strong>. </p>
<p>The <a href="http://www.nytimes.com/2007/05/10/health/10psyche.html?pagewanted=all" title="Psychiatrists, Children and Drug Industry&#8217;s Role - New York Times">Times investigation</a> (also in 2007) sparked a large political fight in which U.S. Senate investigators targeted prominent psychiatrists whose work had expanded the use of antipsychotic drugs as treatment for children. </p>
<p>The end result of this was a <a href="http://www.npr.org/templates/story/story.php?storyId=126281414" title="Law Requires Docs To Disclose Monies From Pharma : NPR">proposed federal law</a> to mandate these payment disclosures nationwide. This law was later folded into  the <a href="http://en.wikipedia.org/wiki/Patient_Protection_and_Affordable_Care_Act" title="Patient Protection and Affordable Care Act - Wikipedia, the free encyclopedia">2010 health care reform package</a>. By 2013, the federal government will publish a database of these disclosures.</p>
<h3>Couch database</h3>
<p>The idea for D4D was sparked from something I wrote for my blog one evening. I was <a href="http://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/" title="Coding for Journalists 101 : A four-part series | Dan Nguyen pronounced fast is danwin">writing some programming tutorials</a> to show journalists, well, how programming could be used for everyday reporting, and I needed a current example. I came across this Times article, <a href="http://www.nytimes.com/2010/04/01/business/01payments.html" title="Pfizer Details Payments to Doctors and Researchers - NYTimes.com">Pfizer Gives Details on Payments to Doctors</a>, which reported that Pfizer was fulfilling the terms of a legal settlement by publishing a <a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp" title="Payment Report | Pfizer: the world's largest research-based pharmaceutical company">searchable database</a> of the health professionals it paid. At that point, it was the fourth such drug company that had disclosed its payments in advance of the 2013 law &ndash; most of the others had done so also <a href="http://www.propublica.org/article/lawsuits-say-pharma-illegally-paid-doctors-to-push-their-drugs/" title="Lawsuits Say Pharma Illegally Paid Doctors to Push Their Drugs - ProPublica">as part of settling their own lawsuits</a>.</p>
<p>At that point, I knew virtually nothing about the issue. But what I did see was that <a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp" title="Payment Report | Pfizer: the world's largest research-based pharmaceutical company">Pfizer&#8217;s site seemed unnecessarily cumbersome</a>. Though the disclosures were mandated, Pfizer&#8217;s site made it difficult to do simple analyses such as finding the professionals who had received substantial amounts or even the sum of the database&#8217;s payments. So I wrote <a href="http://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments" title="Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List | Dan Nguyen pronounced fast is danwin">a scraper and published the code and data for others to use</a>.</p>
<p>Data-scraping and pharmaceutical payments isn&#8217;t a high-traffic topic, but the blog post caught the eyes of a few important people. My colleagues Charles Ornstein and Tracy Weber, both Pulitzer Prize health reporters, were well-informed about the issue but didn&#8217;t know how feasible it would be to do a broad analysis of the data. I think I would eventually have written a scraper and published the delimited data for every drug company that had so far been required to disclose (this article in the Times, <a href="http://www.nytimes.com/2010/04/13/business/13docpay.html" title="Data on Fees to Doctors Is Called Hard to Parse - NYTimes.com">Data on Fees to Doctors Is Called Hard to Parse</a>, was a particular inspiration), but Charlie and Tracy knew how to turn it into a <a href="http://www.propublica.org/series/dollars-for-docs" title="Dollars for Doctors - ProPublica">strong investigation</a>.</p>
<p><strong>Another side-benefit of self publishing:</strong> A reporter at PBS had also been working to collect and parse the data. He noticed My Pfizer post, though, and rather than being competitors, PBS teamed up with ProPublica in <a href="http://www.propublica.org/series/dollars-for-docs" title="Dollars for Doctors - ProPublica">conducting the investigation</a>.</p>
<p><a id="old">#</a></p>
<h3>Done stories are never dead. They aren&#8217;t even done.</h3>
<p>Even with the groundbreaking work already done by Dr. Ross, Public Citizen and the Times in 2007, the subsequent Senate investigations, and the impending official database in 2013, there was still room for D4D to become a valuable, innovative investigation. It had considerable impact on the debate, prompting companies and medical institutions to change their disclosure and conflict-of-interest policies. D4D is currently is the most-viewed resource at ProPublica. </p>
<p>You can <a href="http://www.propublica.org/series/dollars-for-docs" title="Dollars for Doctors - ProPublica">read our series coverage here</a>.</p>
<p>The most obvious way that D4D differed from previous investigations is that it looked at the available nationwide set, not just Minnesota&#8217;s. Some of the data-driven angles we took included:</p>
<ul>
<li>Cross-referencing our payments database against all the state medical licensing databases to see if any of the highly-paid doctors had serious disciplinary issues. Prescription data is not publicly available,<a href="http://www.propublica.org/article/pharma-payments-to-doctors-with-sanctions" title="Dollars for Doctors - ProPublica"> so this was an alternative way of scrutinizing companies&#8217; assertions that they paid doctors for their prestige</a>, and not their prescribing habits. It essentially involved writing a scraper for each state website.</li>
<li>Cross-referencing our payments database against the faculty list at various medical schools to see <a href="http://www.propublica.org/article/medical-schools-policies-on-faculty-and-drug-company-speaking-circuit" title="Dollars for Doctors - ProPublica">if there were discrepancies in what the doctors disclosed to their institutions</a>.</li>
<li>Examining the differences between the quarterly reports, to see if payment levels dropped and why. Since the disclosures are largely unaudited, collecting and normalizing the data for comparison is the only way to double-check the completeness of the reports</li>
<li>Much of D4D&#8217;s success came in our willingness <a href="http://www.propublica.org/blog/item/doctors-on-pharma-payroll-what-our-partners-found" title="Doctors on Pharma Payroll: What Our Partners Found - ProPublica">to share our data with our reporting partners</a>, and then later, to any newsroom that asked. This spawned hundreds of independently reported stories.</li>
</ul>
<p>So even on a well-trodden story, there are still countless angles when you combine a keen reporter&#8217;s instinct with an ability to collect data. A prevalent theme in the journalism industry &ndash; especially in the age of Twitter &ndash; is the drive to be first. I understand that it&#8217;s important in a ratings-sweep context, and it certainly gets the blood going when you&#8217;re competing for a story, but I&#8217;ve never thought that it was good for journalism or particularly useful to news consumers.</p>
<p>This is why I enjoy data-driven journalism. On any given topic, there are so many valid and substantial ways to cross-examine the evidence behind a story and produce meaningful stories. <strong>And as time passes, the analysis only becomes <em>more interesting</em>, not less</strong>, because more and more data is added to the picture. Data-driven journalism can be done through simple use of Excel. But programming, as I explain in the next section, can vastly increase the opportunities and depth of these analyses.</p>
<p><em>A sidenote: The hardest part of D4D was the logistics, which were only manageable through programming. It&#8217;s too boring to go into detail, but D4D required a collaborative reporting process more disciplined than &#8220;just send that Word.doc as an attachment.&#8221; I don&#8217;t forsee a project of D4D&#8217;s scale being attempted by many other organizations, because of the difficulty in managing all the moving parts.</em> </p>
<h3>Never attribute to malice that which&#8230;</h3>
<p>The most interesting reaction I got from D4D came not from doctors,  but from researchers, compliance officers, and even federal investigators whose jobs it was to monitor these disclosures. They were thrilled that D4D made it so easy for them to check up on things. What was surprising to me was that I just assumed that everyone who had a professional stake in overseeing these disclosures had already collected the data themselves. The scraping-and-collecting of the company reports was by far the easiest part of D4D, and even if you couldn&#8217;t program you could at least hack together a system of copying-and-pasting from the various data-sources, if such information was vital to your job.</p>
<p>
The truth is that the concept of &#8220;user interface&#8221; is as critical to an investigation as it is in separating successful tech startups from their clunky, failed competitors. I occasionally get asked for advice by researchers on their own projects. What stalls a surprising number of interesting investigative projects and analyses is not something as malicious as a shady CEO or the threat of a lawsuit, but problems as benign as: <em>a company over the years has hundreds of datafiles, all zipped and scattered across many webpages. Is it possible to somehow download them all, unzip them, and put them into a database (or Excel) just so I can just find if someone&#8217;s name is in there?</em>. Or: <em>If I could only fix the few places where the agency screwed up in outputting this comma-delimited data file, I could analyze it in Excel.</em>
</p>
<p>If you can&#8217;t program, this isn&#8217;t a trivial problem: working with ten such files is an inconvenience. When there are 100 files, then the momentum for an inquiry might just stop dead, especially if the inquiry arises from curiosity instead of certainty (think of how many great stories and investigations have come out of such casual inquiries). However, with some basic programming, the difference between organizing 10 files and 10,000 files is a matter of <em>milliseconds</em>. A programmer thus has the power to not only work with already-normalized datasets and produce interesting stories, but he/she can (efficiently) create datasets that otherwise would have never been examined.</p>
<p>To reiterate: there are an astonishing number of stories and inquiries that are derailed by what are trivial technical issues to any half-competent programmer. This is both <em>alarming</em> from a civic perspective, yet <em>extremely</em> exciting if you&#8217;re someone with the right skills at the right time, as you&#8217;ll never want for ideas.</p>
<p><a id="roadmap">#</a></p>
<h2>A practical road to programming</h2>
<p>OK enough abstract talk. The amazing promise of programming is that there are so many opportunities. This leads to its biggest problem when trying to learn it: there are too many places to start. </p>
<p>This section contains some advice. It may not be the best advice for everyone, but at least everything I mention below is absolutely free to use and to learn from.</p>
<h3>The Basics</h3>
<p>If you haven&#8217;t already, create a <a href="http://twitter.com">Twitter account</a>. Stop kvetching about how &#8220;no one wants to read what I ate for breakfast&#8221; because that casually implies that people <em>would</em> want to read your 100,000 word opus, as soon as you finish it. They won&#8217;t.<br />
But having a Twitter account provides one avenue to spread your work and just as importantly, a channel to learn from people who <em>aren&#8217;t</em> just tweeting about their breakfasts.
</p>
<p>Get a <a href="http://dropbox.com">Dropbox</a>. Get used to putting stuff on the cloud. Not your sensitive documents, but things like e-reference books and datasets and code. This is much better than emailing (and in some ways, more secure) things to yourself.
</p>
<p>Create a <a href="https://accounts.google.com/SignUp?">Google account</a>. Even if you don&#8217;t use it for email, Google Documents is extremely useful. And you may find use from the other parts of Google&#8217;s ecosystem
</p>
<p><strong>Get a second or third browser:</strong> If you&#8217;re paranoid about Twitter/Facebook/Google cookies tracking you, then use one browser to handle those accounts and another browser to do all your other web-browsing.</p>
<h3>Data stuff</h3>
<p>If you don&#8217;t have Excel, you can download OpenOffice&#8217;s capable suite. That said, Google Docs is probably the easiest way to get into keeping spreadsheets, with the added bonus of being in the cloud and thus easy to do collaborations and to use your programming with. Again, be cautious about putting very sensitive data there. But I&#8217;d argue that the cloud is still safer than keeping everything on a stealable-Macbook.</p>
<p><a href="http://code.google.com/p/google-refine/" title="google-refine - Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) - Google Project Hosting">Google Refine</a>: this was a project formerly known as Gridworks. It runs in the browser, Unlike Google Docs, you don&#8217;t need a Google Account or to be online to use it.  It&#8217;s similar to a spreadsheet except that you won&#8217;t use it to calculate an average/sum of a column or to make charts. It&#8217;s for cleaning data, to quickly determine that &#8220;John F. Kennedy&#8221; &#8212; &#8220;Jack F. Kennedy&#8221;, &#8220;John Fitzgerald Kennedy&#8221; and &#8220;J.F. Kennedy&#8221; are all the same person. There have been some investigative data-work that would not have been possible without this tool. Check out the <a href="http://code.google.com/p/google-refine/">video introduction</a> here; <a href="http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning" title="Chapter 1. Using Google Refine to Clean Messy Data - ProPublica">I&#8217;ve also written a tutorial at ProPublica</a>.
</p>
<p>Given the number of important stories that basically boil down to finding someone&#8217;s name several times in a database, it&#8217;s a little amazing to me that every serious reporter hasn&#8217;t at least tried <a href="http://code.google.com/p/google-refine/">Google Refine</a>.
</p>
<h3>Programming</h3>
<p>Don&#8217;t get stalled by trying to figure out which is the best language. The three most current popular: Ruby, Python and JavaScript, will serve your needs well, and you&#8217;ll find it relatively easy to pick up the other two after learning one.
</p>
<p>That said, there&#8217;s one main big difference: Ruby and Python are more general purpose scripting languages. You can use them to sort your files, process (and build) a database, and even build a full out website (you may have heard of Ruby on Rails and Django).
</p>
<p>JavaScript is most typically used for web interactivity in the browser, everything from animating buttons to full-fledged applications. Because it&#8217;s in every browser, it takes no work to try it out and to produce interactive bits. It takes a little more work to setup JS to do things like web-scraping or local file processing.</p>
<p>JS has an additional advantage in that there many interactive tutorials that you can access through your browser. <a href="http://www.codecademy.com/#!/exercises/0" title="Learn to code | Codecademy">Codecademy</a> is one of the best known ones.</p>
<h3>Programming resources</h3>
<p>Zed Shaw&#8217;s &#8220;<a href="http://learnpythonthehardway.org/" title="Learn Python The Hard Way | A Beginner Programming Book">How to Learn Python the Hard Way</a>&#8221; is one of the most popular beginnner-level (and free) ebooks. There is also a <a href="http://ruby.learncodethehardway.org/" title="Learn Code The Hard Way -- Books And Courses To Learn To Code">Ruby version</a>.</p>
<p>A little self-promotion: for people who best learn through practical projects, I&#8217;ve been working on my own Ruby beginner&#8217;s guide, tentatively titled the <a href="http://ruby.bastardsbook.com/" title="The Bastards Book of Ruby">Bastards Book of Ruby.</a> It&#8217;s a work in progress but you&#8217;ll find some ideas on starter projects to work towards (a <a href="http://ruby.bastardsbook.com/chapters/intro_tweet_fetch/" title="Tweet Fetching | The Bastards Book of Ruby">good start is writing a script to download and store all your tweets</a>).</p>
<h3>HTML</h3>
<p>Don&#8217;t learn HTML. That is, <strong>don&#8217;t take a course</strong> in HTML. Learn enough to know that the HTML behind a webpage is just plain text. And learn enough to understand how:
</p>
<p>&lt;a target=&quot;_blank&quot; href=&quot;http://en.wikipedia.org/wiki/HTML&quot;&gt;Wikipedia&#x27;s entry on HTML&lt;/a&gt;
</p>
<p>Creates a link that takes you to Wikipedia in a new window, like this: <a target="_blank" href="http://en.wikipedia.org/wiki/HTML">Wikipedia&#8217;s entry on HTML</a>
</p>
<p>That&#8217;s basically enough to get the concept of HTML (and the idea of meta-information) and to begin scraping webpages. One of the fastest ways to learn as you go is to get acquainted with your <a href="http://ruby.bastardsbook.com/chapters/web-inspecting-html/" title="Meet Your Web Inspector | The Bastards Book of Ruby">web browser&#8217;s inspector.</a>
</p>
<p><a id="non-programmers">#</a></p>
<h2>People to learn from</h2>
<p>I&#8217;m not a particularly inspiring example of a journo-coder: I took up computer engineering because I was afraid there wouldn&#8217;t be many journalism jobs so I kind of half-stumbled into combining reporting with code because it&#8217;s easy to learn programming fundamentals during college. This is why if you&#8217;re a college student now, I strongly advise you to pick up programming at a time when learning is your main job in life.</p>
<p>Much more impressive to me are people who were doing well in their day jobs but decided to pick up programming in their spare hours &ndash; and then returned to do their day jobs with newfound inspiration and possibilities.</p>
<p><strong>John Keefe (WNYC)</strong> &ndash; About a year-and-a-half ago, I remember John coming to Hacks/Hackers events to watch people code and to continually apologize for having to ask what he thought were dumb questions. In an incredibly short time, <a href="http://johnkeefe.net/" title="johnkeefe.net - Data news + journalism technology">Keefe learned enough hacking to produce some great, creative apps</a> and now heads WNYC&#8217;s data team, and also leads the discussion among news orgs on how to modernize the way we do things like election coverage.</p>
<p><strong>Zach Sims</strong> &ndash; Sims is a co-founder of <a href="http://codecademy.com">Codecademy</a>. He was a <a href="http://www.businessweek.com/magazine/computer-coding-not-for-geeks-only-01262012.html" title="Computer Coding: Not for Geeks Only - Businessweek">poli-sci major who ventured into tech entrepreneurship</a> but was frustrated that his lack of technical skills hindered his work. He learned programming on his own and with a co-founder, created Codecademy to teach others how to program. <a href="http://www.codecademy.com/#!/exercises/0" title="Learn to code | Codecademy">Codecademy</a> itself is one of the hottest recent startups.</p>
<p><strong>Neil Saunders</strong> &ndash;  I stumbled across <a href="http://nsaunders.wordpress.com/about-2/about/" title="Why desperate? | What You&#8217;re Doing Is Rather Desperate">Neil Saunders&#8217; blog</a> while looking for R + Ruby examples. His blog is titled &#8220;<a href="http://nsaunders.wordpress.com/about-2/about/" title="Why desperate? | What You&#8217;re Doing Is Rather Desperate">What You&#8217;re Doing is Rather Desperate</a>&#8220;, inspired by the reaction of a colleague who <a href="http://nsaunders.wordpress.com/about-2/about/" title="Why desperate? | What You&#8217;re Doing Is Rather Desperate">was apparently unimpressed with his use of programming in his bioscience job</a>.<br />
   It&#8217;s a misconception that scientifically-minded professionals also know how to program. In fact, some don&#8217;t even have basic computer skills. Saunders not only publishes his code, but shows how others in his field can greatly improve their research with programming skills.
</p>
<p>
   <strong>Kaitlyn Trigger</strong> &ndash;<br />
   As this <a href="http://techcrunch.com/2012/02/09/awwwwwwwwwwwwwwwwww/" title="Instagram Founder&#8217;s Girlfriend Learns How To Code For V-Day, Builds Lovestagram | TechCrunch">TechCrunch article</a> puts it, <a href="https://twitter.com/#!/kaitlyntrigger">Kaitlyn Trigger</a> was a poly sci major who &#8220;never took any computer classes.&#8221; She has been together with Instagram co-founder Mike Krieger but had been frustrated that she didn&#8217;t understand his work. So she picked up/downloaded <a href="http://learnpythonthehardway.org/" title="Learn Python The Hard Way | A Beginner Programming Book">Learn Python the Hard Way</a>, learned the Python-based web framework Django, and created Lovestagram as a Valentine Day&#8217;s present.</p>
<p>
   It&#8217;s not just a cute story &ndash; learning Python and Django and making something within 2 months in your spare time is a pretty incredible achievement. It&#8217;s an awesome example of how having a project in mind can really help you learn code.
</p>
<p><strong>Matt Waite</strong> &ndash; was an <a href="http://journalism.unl.edu/cojmc/about/bios/waite.shtml" title="CoJMC | Matt Waite">award winning newspaper reporter </a>before becoming a web developer. He went on, as a web developer, to win the most prestigious of journalism prizes: a Pulitzer for <a href="http://www.politifact.com/" title="">PolitiFact</a>. He now teaches at University of Nebraska Lincoln <a href="http://blog.mattwaite.com/" title="Research notes">and keeps a blog related to his work </a>with journalism students.</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/02/code-dont-tell-programming-as-an-essential-journalism-skill/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Photos from outside of NYFW 2012 (Fall/Winter)</title>
		<link>http://danwin.com/2012/02/photos-from-outside-of-nyfw-2012-fallwinterl/</link>
		<comments>http://danwin.com/2012/02/photos-from-outside-of-nyfw-2012-fallwinterl/#comments</comments>
		<pubDate>Sun, 19 Feb 2012 23:35:45 +0000</pubDate>
		<dc:creator>Dan Nguyen</dc:creator>
				<category><![CDATA[visuals]]></category>

		<guid isPermaLink="false">http://danwin.com/?p=1907</guid>
		<description><![CDATA[Another Fashion Week come and gone. I didn&#8217;t have the time to go to any actual events but I did take part in a couple of related things: I did portraits and some scenery photos during Correll Correll&#8217;s casting call.&#8230; <a href="http://danwin.com/2012/02/photos-from-outside-of-nyfw-2012-fallwinterl/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<div class="photo imgwrap">
	<a href="http://www.flickr.com/photos/zokuga/6829977789/in/set-72157627606603854"><img style="width: 100%"  src="http://danwin-files.s3.amazonaws.com/pics/blog/7158/6829977789_a9731e708a_b.jpg" alt="Fashion Week"></a>
</div>
<p>
	Another Fashion Week come and gone. I didn&#8217;t have the time to go to any actual events but I did take part in a couple of related things:
</p>
<p>I did portraits and some scenery photos during <a href="http://www.correllcorrell.com/" title="CORRELLCORRELL | Home">Correll Correll&#8217;s casting call</a>. I didn&#8217;t really get time to set much up but it wasn&#8217;t too hectic of a shoot.</p>
<p>To make it easier for the casting director to keep track of them, the models were asked to hold up their cards while their portraits were taken. I think it&#8217;s really interesting how similar (or different) they look compared to their cards, without fancy makeup, lights or post-processing:</p>
<div class="photo imgwrap">
<a href="http://500px.com/photo/5004135"><img style="width: 100%"  src="http://danwin-files.s3.amazonaws.com/pics/blog/11f14d4bddcbf0cb602ea626c02ff83a771c8e16/4.jpg" alt="models and cards"></a>
</div>
<p>The light changed drastically throughout the day and afternoon:</p>
<div class="photo imgwrap">
	<a href="http://www.flickr.com/photos/zokuga/6835326327/in/set-72157627606603854/"><img style="width: 100%"  src="http://danwin-files.s3.amazonaws.com/pics/blog/7151/6835326327_4f03bd3d9f_b.jpg" alt="Fashion Week"></a>
</div>
<div class="photo imgwrap">
<p><a href="http://www.flickr.com/photos/zokuga/6829987827/" title="NYFW 2012 Casting Call by Dan Nguyen @ New York City, on Flickr"><img src="http://farm8.staticflickr.com/7027/6829987827_d90d2f1cab_b.jpg" style="width: 100%" alt="NYFW 2012 Casting Call"></a>
</div>
<p>So this wasn&#8217;t officially part of Fashion Week in anyway, but still the coolest fashion-related (and celebrity-sighting) experience I&#8217;ve had in the city. While me and my co-worker were exploring the ticker-paper-apocalypse after the Giants&#8217; Super Bowl Parade, she spotted <a href="http://www.newyorker.com/reporting/2009/03/16/090316fa_fact_collins" title="Bill Cunningham takes Manhattan : The New Yorker">NYT fashion photog Bill Cunningham</a> making his way down Broadway through the crowds and paper piles:</p>
<div class="photo imgwrap">
<a href="http://www.flickr.com/photos/zokuga/6841056093/" title="Bill Cunningham, on the street at the Super Bowl Giants Parade by Dan Nguyen @ New York City, on Flickr"><img style="width: 100%"  src="http://danwin-files.s3.amazonaws.com/pics/blog/7014/6841056093_d1d1e84820_b.jpg"   alt="Bill Cunningham, on the street at the Super Bowl Giants Parade"></a></p>
</div>
<div class="photo imgwrap">
	<a href="http://www.flickr.com/photos/zokuga/6841067449/" title="Bill Cunningham, on the street, after the Super Bowl Giants Parade by Dan Nguyen @ New York City, on Flickr"><img style="width: 100%"  src="http://danwin-files.s3.amazonaws.com/pics/blog/7143/6841067449_43cfd0ab84_b.jpg"   alt="Bill Cunningham, on the street, after the Super Bowl Giants Parade"></a>
</div>
<p>I hadn&#8217;t yet <a href="http://www.imdb.com/title/tt1621444/" title="Bill Cunningham New York (2010) - IMDb">finished watching his documentary</a> so I was too intimidated to say &#8220;Hello.&#8221; Afterwards, I went home and watched it and wish I at least gave him a thumbs up. One of my favorite parts of the documentary is when Cunningham describes how he maintains his outsider status when invited to glam events. He&#8217;ll eat a modest meal beforehand and won&#8217;t even accept a glass of water while doing his work. Just like any other standup journalist. I hope I get to properly meet him one of these days.</p>
<p>You can see my other <a href="http://www.flickr.com/photos/zokuga/sets/72157627606603854/" title="New York Fashion Week - a set on Flickr">Fashion Week-related photos here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://danwin.com/2012/02/photos-from-outside-of-nyfw-2012-fallwinterl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

