<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>Comments on: Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List</title>
	<atom:link href="https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/feed/" rel="self" type="application/rss+xml" />
	<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/</link>
	<description>Words, photos, and code by Dan Nguyen. The &#039;g&#039; is mostly silent.</description>
	<lastBuildDate>Sun, 07 Dec 2025 04:13:29 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.2.39</generator>
	<item>
		<title>By: Code, Don&#8217;t Tell: Programming as an Essential Journalism Skill &#124; Dan Nguyen pronounced fast is danwin</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-2045</link>
		<dc:creator><![CDATA[Code, Don&#8217;t Tell: Programming as an Essential Journalism Skill &#124; Dan Nguyen pronounced fast is danwin]]></dc:creator>
		<pubDate>Wed, 22 Feb 2012 19:58:23 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-2045</guid>
		<description><![CDATA[[...] At that point, I knew virtually nothing about the issue. But what I did see was that Pfizer&#8217;s site seemed unnecessarily cumbersome. Though the disclosures were mandated, Pfizer&#8217;s site made it difficult to do simple analyses such as finding the professionals who had received substantial amounts or even the sum of the database&#8217;s payments. So I wrote a scraper and published the code and data for others to use. [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] At that point, I knew virtually nothing about the issue. But what I did see was that Pfizer&#8217;s site seemed unnecessarily cumbersome. Though the disclosures were mandated, Pfizer&#8217;s site made it difficult to do simple analyses such as finding the professionals who had received substantial amounts or even the sum of the database&#8217;s payments. So I wrote a scraper and published the code and data for others to use. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dataist blog: An inspiring case for journalists learning to code &#124; Dan Nguyen pronounced fast is danwin</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-928</link>
		<dc:creator><![CDATA[dataist blog: An inspiring case for journalists learning to code &#124; Dan Nguyen pronounced fast is danwin]]></dc:creator>
		<pubDate>Wed, 16 Feb 2011 13:02:01 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-928</guid>
		<description><![CDATA[[...] Dollars for Docs project originated in part from this Pfizer-scraping lesson I added on to my programming tutorial: I needed a timely example of public data that wasn&#8217;t [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] Dollars for Docs project originated in part from this Pfizer-scraping lesson I added on to my programming tutorial: I needed a timely example of public data that wasn&#8217;t [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: amul</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-848</link>
		<dc:creator><![CDATA[amul]]></dc:creator>
		<pubDate>Thu, 09 Dec 2010 20:14:33 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-848</guid>
		<description><![CDATA[really really awesome that you did this.. I think if there were more GUI oriented tools built into the browser itself via plugins and panels this practice of large data-set anaylsis could really catch on.. especially with these cool examples.

On a side note these ceo&#039;s siting on multiple boards is fascinating..]]></description>
		<content:encoded><![CDATA[<p>really really awesome that you did this.. I think if there were more GUI oriented tools built into the browser itself via plugins and panels this practice of large data-set anaylsis could really catch on.. especially with these cool examples.</p>
<p>On a side note these ceo&#8217;s siting on multiple boards is fascinating..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Nguyen&#8217;s coding for journalists 101</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-846</link>
		<dc:creator><![CDATA[Dan Nguyen&#8217;s coding for journalists 101]]></dc:creator>
		<pubDate>Tue, 07 Dec 2010 20:01:55 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-846</guid>
		<description><![CDATA[[...] Click here for this pageâ€™sÂ table of contents. Or jump to the theÂ theory lesson. Or to theÂ programming exercise. Or, if you already know what a function and variable is, and have Ruby installed, go straight to two of my walkthroughs of building a real-world journalistic-minded web scraper: ScrapingÂ a jail site, and scrapingPfizerâ€™s doctor payment list. [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] Click here for this pageâ€™sÂ table of contents. Or jump to the theÂ theory lesson. Or to theÂ programming exercise. Or, if you already know what a function and variable is, and have Ruby installed, go straight to two of my walkthroughs of building a real-world journalistic-minded web scraper: ScrapingÂ a jail site, and scrapingPfizerâ€™s doctor payment list. [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-389</link>
		<dc:creator><![CDATA[Joseph]]></dc:creator>
		<pubDate>Wed, 12 May 2010 17:10:03 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-389</guid>
		<description><![CDATA[Dan

Where is &#039;process_row&#039;?  I can&#039;t see it defined anywhere? Thanks

Joe]]></description>
		<content:encoded><![CDATA[<p>Dan</p>
<p>Where is &#8216;process_row&#8217;?  I can&#8217;t see it defined anywhere? Thanks</p>
<p>Joe</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pfizer Data Redux &#124; Danwin: Dan Nguyen, in short</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-379</link>
		<dc:creator><![CDATA[Pfizer Data Redux &#124; Danwin: Dan Nguyen, in short]]></dc:creator>
		<pubDate>Wed, 28 Apr 2010 14:22:43 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-379</guid>
		<description><![CDATA[[...] the code and results to my guide on how to scraper Pfizer&#8217;s list of payments to doctors. It now contains a more normalized file that has a line for every doctor and payment. The aggregate [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] the code and results to my guide on how to scraper Pfizer&#8217;s list of payments to doctors. It now contains a more normalized file that has a line for every doctor and payment. The aggregate [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pharma Conduct Guy</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-339</link>
		<dc:creator><![CDATA[Pharma Conduct Guy]]></dc:creator>
		<pubDate>Tue, 06 Apr 2010 15:20:42 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-339</guid>
		<description><![CDATA[Hi Dan,

What an excellent write-id.  I was pondering the idea of whether or not my readers would be interested in tutorial for how I aggregated the data so quickly.  Thus, by posting such an excellent overview of web-scraping, you have saved me a tremendous amount of time.  I&#039;m a PHP/MySQL user myself, but anyone with a modicum of programming skill should be able to follow your example.  Posting the raw data, which I had hoped to do later this week (I&#039;m traveling at the moment), was a nice thing to do.  I didn&#039;t want to post the full set until I&#039;d had a chance to perform some serious QC on the data.  Looking over your numbers, we seem to have found most of the same things, as well as some of the same problems.

For example, regarding Duke, I think I understand where the discrepancy lies.  I alluded to this briefly in a previous post (see &lt;a href=&quot;http://blog.pharmaconduct.org/2010/04/who-were-top-5-recipients-of-money-from.html?src=Danwin.com+20100406&quot; rel=&quot;nofollow&quot;&gt;Who were the top 5 recipients of money from Pfizer during the period 2009 Q3-Q4?&lt;/a&gt;), but I have a more complete post in the works.  Thanks again for writing such an excellent tutorial, for making the data available, and for posting on my blog.  In one of my next posts, I&#039;ll be sure to spotlight this post.

-Eric]]></description>
		<content:encoded><![CDATA[<p>Hi Dan,</p>
<p>What an excellent write-id.  I was pondering the idea of whether or not my readers would be interested in tutorial for how I aggregated the data so quickly.  Thus, by posting such an excellent overview of web-scraping, you have saved me a tremendous amount of time.  I&#8217;m a PHP/MySQL user myself, but anyone with a modicum of programming skill should be able to follow your example.  Posting the raw data, which I had hoped to do later this week (I&#8217;m traveling at the moment), was a nice thing to do.  I didn&#8217;t want to post the full set until I&#8217;d had a chance to perform some serious QC on the data.  Looking over your numbers, we seem to have found most of the same things, as well as some of the same problems.</p>
<p>For example, regarding Duke, I think I understand where the discrepancy lies.  I alluded to this briefly in a previous post (see <a href="http://blog.pharmaconduct.org/2010/04/who-were-top-5-recipients-of-money-from.html?src=Danwin.com+20100406" rel="nofollow">Who were the top 5 recipients of money from Pfizer during the period 2009 Q3-Q4?</a>), but I have a more complete post in the works.  Thanks again for writing such an excellent tutorial, for making the data available, and for posting on my blog.  In one of my next posts, I&#8217;ll be sure to spotlight this post.</p>
<p>-Eric</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Credigy Receivables and Steve Stewart &#124; SAP Business One &#8211; How To &#8211; A/R Down Payments</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-338</link>
		<dc:creator><![CDATA[Credigy Receivables and Steve Stewart &#124; SAP Business One &#8211; How To &#8211; A/R Down Payments]]></dc:creator>
		<pubDate>Tue, 06 Apr 2010 15:13:48 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-338</guid>
		<description><![CDATA[[...] Coding for Journalists 104: Pfizer&#039;s Doctor Payments; Making a &#8230; [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] Coding for Journalists 104: Pfizer&#39;s Doctor Payments; Making a &#8230; [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Coding for Journalists 101 : A four-part series &#124; Danwin: Dan Nguyen, in short</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comment-337</link>
		<dc:creator><![CDATA[Coding for Journalists 101 : A four-part series &#124; Danwin: Dan Nguyen, in short]]></dc:creator>
		<pubDate>Tue, 06 Apr 2010 14:32:16 +0000</pubDate>
		<guid isPermaLink="false">https://danwin.com/?p=643#comment-337</guid>
		<description><![CDATA[[...] Danwin: Dan Nguyen, in short The &#039;g&#039; is mostly silent   Skip to content About Dan NguyenThe Great Snowball Fight in Times SquareDepGal, a Pretty, Free, and Fast Flash Photo Gallery       &#171; Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List [...]]]></description>
		<content:encoded><![CDATA[<p>[&#8230;] Danwin: Dan Nguyen, in short The &#39;g&#39; is mostly silent   Skip to content About Dan NguyenThe Great Snowball Fight in Times SquareDepGal, a Pretty, Free, and Fast Flash Photo Gallery       &laquo; Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List [&#8230;]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
