<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>danwin.com &#187; pfizer</title>
	<atom:link href="https://danwin.com/tag/pfizer/feed/" rel="self" type="application/rss+xml" />
	<link>https://danwin.com</link>
	<description>Words, photos, and code by Dan Nguyen. The &#039;g&#039; is mostly silent.</description>
	<lastBuildDate>Thu, 21 Nov 2019 12:29:57 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.2.39</generator>
	<item>
		<title>Pfizer Data Redux</title>
		<link>https://danwin.com/2010/04/pfizer-data-redux/</link>
		<comments>https://danwin.com/2010/04/pfizer-data-redux/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 14:22:36 +0000</pubDate>
		<dc:creator><![CDATA[Dan Nguyen]]></dc:creator>
				<category><![CDATA[works]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[doctors]]></category>
		<category><![CDATA[journalists]]></category>
		<category><![CDATA[pfizer]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">https://danwin.com/?p=763</guid>
		<description><![CDATA[<p>Updated the code and results to my guide on how to scraper Pfizer&#8217;s list of payments to doctors. It now contains a more normalized file that has a line for every doctor and payment. The aggregate totals changed marginally.</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2010/04/pfizer-data-redux/">Pfizer Data Redux</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p>Updated the code and results to my <a href="https://danwin.com/works/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/">guide on how to scraper Pfizer&#8217;s list of payments to doctors</a>. It now contains a more normalized file that has a line for every doctor and payment. The aggregate totals changed marginally.</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2010/04/pfizer-data-redux/">Pfizer Data Redux</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://danwin.com/2010/04/pfizer-data-redux/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Coding for Journalists 101 : A four-part series</title>
		<link>https://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/</link>
		<comments>https://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 13:51:40 +0000</pubDate>
		<dc:creator><![CDATA[Dan Nguyen]]></dc:creator>
				<category><![CDATA[works]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[pfizer]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[web scraping]]></category>

		<guid isPermaLink="false">https://danwin.com/?p=661</guid>
		<description><![CDATA[<p>Update, January 2012: Everything&#8230;yes, everything, is superseded by my free online book, The Bastards Book of Ruby, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you&#8217;ll find here. I&#8217;m only keeping this old walkthrough up as a historical reference. I&#8217;m sure [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/">Coding for Journalists 101 : A four-part series</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></description>
				<content:encoded><![CDATA[<div id="attachment_663" style="width: 510px" class="wp-caption aligncenter"><a href="http://www.flickr.com/photos/nicocavallotto/363251198/"><img src="https://danwin.com/words/wp-content/uploads/2010/04/363251198_9537fe7c6d.jpg" alt="nico.cavallotto" title="nico.cavallotto 363251198_9537fe7c6d" width="500" height="357" class="size-full wp-image-663" /></a><p class="wp-caption-text">Photo by Nico Cavallotto on Flickr</p></div>
<p><strong>Update, January 2012:</strong> Everything&#8230;yes, everything, is superseded by my free online book, <a href="http://ruby.bastardsbook.com">The Bastards Book of Ruby</a>, which is a much more complete walkthrough of basic programming principles with far more practical and up-to-date examples and projects than what you&#8217;ll find here. </p>
<p>I&#8217;m only keeping this old walkthrough up as a historical reference. I&#8217;m sure the code is so ugly that I&#8217;m not going to even try re-reading it.</p>
<p>So check it out: <a href="http://ruby.bastardsbook.com">The Bastards Book of Ruby</a></p>
<p>-Dan</p>
<p>&#8212;</p>
<p><strong>Update, Dec. 30, 2010:</strong> I published <a href="http://www.propublica.org/nerds/item/doc-dollars-guides-collecting-the-data">a series of data collection and cleaning guides for ProPublica</a>, to describe what I did for our Dollars for Docs project. There is a <a href="http://www.propublica.org/nerds/item/scraping-websites">guide for Pfizer which supersedes the one I originally posted here</a>.</p>
<p>So a little while ago, I set out to write some tutorials that would guide the non-coding-but-computer-savvy journalist through enough programming fundamentals so that he/she could write a web scraper to collect data from public websites. A &#8220;little while&#8221; turned out to be more than a month-and-a-half. I actually wrote most of it in a week and then forgot about. The timeliness of the fourth lesson, which shows <a href="https://danwin.com/works/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/">how to help Pfizer in its mission to more transparent</a>, compelled me to just publish them in incomplete form. There&#8217;s probably inconsistencies in the writing and some of the code examples, but the final code sections at the end of each tutorial do seem to execute as expected.</p>
<p>As the tutorials are aimed at people who aren&#8217;t experienced programming, the code is pretty verbose, pedantic, and in some cases, a little inefficient. It was my attempt to think how to make the code most readable, and I&#8217;m very welcome to editing changes.</p>
<p><strong>DISCLAIMER:</strong> <em>The code, data files, and results are meant for reference and example only. You use it at your own risk.</em></p>
<ul>
<strong>Tutorial 1: <a href="https://danwin.com/works/coding-for-journalists-go-from-a-know-nothing-to-web-scraper-in-an-hour-hopefully/">Go from knowing nothing to scraping Web pages. In an hour. Hopefully</a></strong> &#8211; A massive, sprawling tutorial that attempts to take you from learning what HTML is, to the definition of an &#8220;if <del datetime="2010-04-06T18:25:14+00:00">loop</del> statement&#8221;, and finally, to using a Ruby library to scrape some information from Wikipedia. It may be too confusing for total neophytes and laughably basic for self-taught programmers. But at least you can kind of see, from beginning to end, one roadmap on going from nothing to something in the programming world.</p>
<p><strong>Tutorial 2: <a href="https://danwin.com/works/coding-for-journalists-102-collecting-info-from-a-county-jail-site/">Scraping a County Jail Website to Find Out Who&#8217;s in Jail </a></strong> &#8211; This uses all the concepts from the first tutorial and applies them to something that a cops reporter might actually want to try out.</p>
<p><strong>Tutorial 3: <a href="https://danwin.com/works/coding-for-journalists-part-3-cross-checking-the-jail-log-with-the-court-system-use-rubys-mechanize-to-fill-out-a-form/">Who&#8217;s Been in Jail Before: Cross-checking the jail logs with the court system with Ruby&#8217;s Mechanize</a></strong> &#8211; This lesson introduces you to another Ruby library that allows you to automate the filling-out of forms so that you can access online databases, in this case, California criminal case histories to see if current inmates are repeat-alleged-offenders.</p>
<p><strong>Tutorial 4: <a href="https://danwin.com/works/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/">Improving Pfizer&#8217;s Dollars-to-Doctors Pay List</a></strong> &#8211; Last week, <strong>Pfizer</strong> <a href="http://www.nytimes.com/2010/04/01/business/01payments.html">released a list of nearly 5,000 doctors and medical institutions</a> that it made $35 million in consulting and expense payments. Fun. Unfortunately, the list, <a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp">as it initially existed online</a>, is just about useless to anyone wanting to examine trends. This tutorial provides a script to make the list more interesting to journalists.
</ul>
<p>The post <a rel="nofollow" href="https://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/">Coding for Journalists 101 : A four-part series</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://danwin.com/2010/04/coding-for-journalists-101-a-four-part-series/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List</title>
		<link>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/</link>
		<comments>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 13:50:19 +0000</pubDate>
		<dc:creator><![CDATA[Dan Nguyen]]></dc:creator>
				<category><![CDATA[works]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[pfizer]]></category>
		<category><![CDATA[scraper]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">https://danwin.com/?p=643</guid>
		<description><![CDATA[<p>Update (12/30): So about an eon later, I&#8217;ve updated this by writing a guide for ProPublica. Heed that one. This one will remain in its obsolete state. Update (4/28): Replaced the code and result files. Still haven&#8217;t written out a thorough explainer of what&#8217;s going on here. Update (4/19): After revisiting this script, I see [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/">Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></description>
				<content:encoded><![CDATA[<p><strong>Update (12/30): So about an eon later, <a href="http://www.propublica.org/nerds/item/scraping-websites">I&#8217;ve updated this by writing a guide for ProPublica</a>. Heed that one. This one will remain in its obsolete state.</strong></p>
<p><strong>Update (4/28): Replaced the code and result files. Still haven&#8217;t written out a thorough explainer of what&#8217;s going on here.</strong></p>
<p><strong>Update (4/19): After revisiting this script, I see that it fails to capture some of the payments to doctors associated with entities. I&#8217;m going to rework this script and post and update soon.</strong></p>
<p>So the world&#8217;s largest drug maker, <strong>Pfizer</strong>, decided to tell everyone which doctors they&#8217;ve been giving money to to speak and consult on its behalf in the latter half of 2009. These doctors are the same ones who, from time to time, recommend the use of Pfizer products.</p>
<p> <a href="http://www.nytimes.com/2010/04/01/business/01payments.html">From the NYT</a>:</p>
<blockquote><p>
				Pfizer, the worldâ€™s largest drug maker, said Wednesday that it paid about $20 million to 4,500 doctors and other medical professionals for consulting and speaking on its behalf in the last six months of 2009, its first public accounting of payments to the people who decide which drugs to recommend. Pfizer also paid $15.3 million to 250 academic medical centers and other research groups for clinical trials in the same period.</p>
<p> A spokeswoman for Pfizer, Kristen E. Neese, said <strong>most of the disclosures were required by an integrity agreement that the company signed in August to settle a federal investigation into the illegal promotion of drugs for off-label uses</strong>.
			</p></blockquote>
<p>
So, not an entirely altruistic release of information. But it&#8217;s out there nonetheless. You can <a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp">view their list here</a>. <strong>Jump to <a href="#results">my results here</a></strong><br />
<br />
<a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp"><img src="https://danwin.com/words/wp-content/uploads/2010/04/pfizer-list.gif" alt="" title="pfizer-list" width="917"  class="aligncenter size-full wp-image-677"></a> Not bad at first glance. However, on further examination, it&#8217;s clear that the list is nearly useless unless you intend to click through all 480 pages manually, or, if you have a doctor in mind and you only care about that one doctor&#8217;s relationship. As a journalist, you probably have other questions. Such as:</p>
<ul>
<li>Which doctor received the most?
				</li>
<li>What was the largest kind of expenditure?
				</li>
<li>Were there any unusually large single-item payments?
				</li>
</ul>
<p>None of these questions are answerable unless you have the list in a spreadsheet. As I mentioned in earlier lessons&#8230;there are cases when the information is freely available, but the provider hasn&#8217;t made it easy to analyze. Technically, they are fulfilling their requirement to be &#8220;transparent.&#8221; </p>
<p>I&#8217;ll give them the benefit of the doubt that they truly want this list to be as accessible and visible as possible&#8230;I tried emailing them to ask for the list as a single spreadsheet, but the email function was broken. So, let&#8217;s just write some code to save them some work and to get our answers a little quicker.<br />
<span id="more-643"></span></p>
<link rel='stylesheet' href='https://danwin.com/css/code.css' type='text/css' media='all'>
<div class="code-doc">
<div class='over-note' style='font-size: 12pt; color: #a44; border: 1px solid black; margin: 20px; padding: 20px;'>This is part of a <a href="https://danwin.com/works/coding-for-journalists-101-a-four-part-series/">four-part series on web-scraping for journalists</a>. As of <strong>Apr. 5, 2010</strong>, it was published a bit incomplete because I wanted to post a timely solution to the <a href="https://danwin.com/works/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/">recent Pfizer doctor payments list release</a>, but the code at the bottom of each tutorial should execute properly. The code examples are meant for reference and I make no claims to the accuracy of the results. Contact <a href="mailto:dan@danwin.com">dan@danwin.com</a> if you have any questions, or leave a comment below.</p>
<p><strong>DISCLAIMER:</strong> <em>The code, data files, and results are meant for reference and example only. You use it at your own risk.</em></p>
</div>
<div class="sec">
<h2>
					The Code<br />
				</h2>
<p>The following code uses the same nokogiri strategies in the past three lessons. But here are the specific considerations that we have to make for Pfizer&#8217;s list:</p>
<ul>
<li>The base url is: <a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp?enPdNm=All&amp;iPageNo=1">http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp?enPdNm=All&amp;<strong>iPageNo=1</strong></a> The most interesting parameter, <strong>iPageNo</strong>, is bolded. If you replace &#8216;1&#8217; with any number, you&#8217;ll see you can progress through the list. There appears to be <a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp?enPdNm=All&amp;iPageNo=486">486 pages</a>.
					</li>
<li>So each page has a table of data with id <strong>#hcpPayments</strong>. The rows of data aren&#8217;t very normalized. For example, each &#8220;Entity Paid&#8221; can have many services/activity listed, with each of those items having another name attached to it. Then there are &#8220;cash&#8221; and &#8220;non-cash&#8221; values, which may or may not be numeric (&#8220;&#8212;&#8221; apparently means 0) There&#8217;s no easy css selector to grab each entity&#8230;but it seems that we can safely assume that if the first table column has a name (and the second and third contain city and state) that this is a new entity.
					</li>
<p>
						These are the steps we&#8217;ll take:</p>
<ul>
<li>Download pages 1 to 486 of the list (each page has 10 entries)</li>
<li>Run a method that gathers all the doctor names from the pages we just downloaded on to our hard drive)</li>
<li>From that list of doctors, query the Pfizer site and gather the individual payments to every doctor.</li>
</ul>
<div class='sec'>
<p>	At the top, I&#8217;ve written a few convenience methods to deal with strings. Also included are: <strong>get_doc_query</strong> is a function we call to extract the doctor name from the links on the site.
					</p>
<p><strong>puts_error</strong> is a quick function to log any errors we might have</p>
<pre name="code" class="ruby">
						# Some general functions to deal with strings
					class String

					  alias_method :old_strip, :strip

					  def strip
						  self.old_strip.gsub(/^[\302\240|\s]*|[\302\240|\s]*$/, '').gsub(/[\r\n]/, " ")
					  end

					  def strip_for_num
					    self.strip.gsub(/[^0-9]/, '')
					  end

					  def blank?
						respond_to?(:empty?) ? empty? : !self
					  end
					end
					
					
					END_PAGE=486
					BASE_URL=''
					DOC_QUERY_URL='http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp?hcpdisplayName='


					def get_doc_query(str)
					  str.match(/hcpdisplayName\=(.+)/)[1]
					end

					def puts_error(str)
					  err = "#{Time.now}: #{str}"
					  puts err
					  File.open("pfizer_error_log.txt", 'a+'){|f| f.puts(err)}
					end
					
					
						</pre>
</p></div>
<div class='sec'>
<p>I found it easiest to download all the pages onto the hard drive first, using something like <a href='http://en.wikipedia.org/wiki/CURL'>CURL</a>, and then run the following code on it.</p>
<p><strong>process_local_pages</strong> is a method that will iterate through every page (you can set BASE_URL to either your hard drive if you&#8217;ve downloaded all the pages yourself, or to the Pfizer page), run <strong>process_row</strong>, and store all the doctor names and payees into separate files, as well as hold all the total amounts</p>
<p> The three resulting files that you get are:</p>
<ul>
<li><strong>pfizer_doctors.txt</strong> &#8211; Every doctor name listed. We will use this in the next step to query each doctor individual on Pfizer&#8217;s site</li>
<li><strong>pfizer_entities.txt</strong> &#8211; A list of every payment made to Entities</li>
<li><strong>pfizer_entity_totals.txt</strong> &#8211; A list of the total payments made to Entities</li>
</ul>
<pre name="code" class="ruby">


						def process_row(row, i, current_entity, arrays)  

						  tds = row.css('td').collect{|r| r.text.strip}

						   if !tds[3].blank? 
						     if !tds[1].blank?
						     # new entity
						     puts tds[0]
							     current_entity = {:name=>tds[0],:city=>tds[1], :state=>tds[2], :page=>i, :services=>[]} 
							     arrays[:entities].push(current_entity) if arrays[:entities]
						  	   current_class = row['class']
							   end

						     if tds[3].match(/Total/)
						       arrays[:totals].push([current_entity[:name], tds[4].strip_for_num, tds[5].strip_for_num].join("\t")) if arrays[:totals]

						     else
						        # new service
						   	   services_td = row.css('td')[3]
						   	   service_name = services_td.css("ul li a")[0].text.strip 
						   	   puts "#{current_entity[:name]}\t#{service_name}" 
						   	   current_entity[:services].push([service_name, tds[4].strip_for_num, tds[5].strip_for_num]) 

						   	   arrays[:doctors].push(services_td.css("ul li ul li a").map{|a| get_doc_query(a['href']) }.uniq) if arrays[:doctors]
						     end
						   elsif tds.reject{|t| t.blank?}.length == 0
						     #blank row
						   else
						     puts_error "Page #{i}: Encountered a row and didn't know what to do with it: #{tds.join("\t")}"
						   end

						   return current_entity
						end





						def process_local_pages

						  doctors_arr = []
						  entities_arr = []
						  totals_arr =[]

						  for i in 1..END_PAGE
						    begin
						  	   page = Nokogiri::HTML(open("#{BASE_URL}#{i}.html"))

						    	 count1, count2 = page.css('#pagination td.alignRight').last.text.match(/([0-9]{1,}) - ([0-9]{1,})/)[1..2].map{|c| c.to_i}
						    	 count = count2-count1+1

						    	 puts_error("Page #{i} WARNING: Pagination count is bad") if count < 0
						    	 puts("Page #{i}: #{count1} to #{count2}")

						    	 rows = page.css('#hcpPayments tbody tr')

						    	 current_entity=nil

						    	 rows.each do |row|  	   
						    	   current_entity= process_row(row, i, current_entity, {:doctors=>doctors_arr, :entities=>entities_arr, :totals=>totals_arr})
						       end

						     rescue Exception=>e
						  	   puts_error "Oops, had a problem getting the #{i}-page: #{[e.to_str, e.backtrace.map{|b| "\n\t#{b}"}].join("\n")}"
						     else


						     end
						  end

						  File.open("pfizer_doctors.txt", 'w'){|f|
						    doctors_arr.uniq.each do |d|
						        f.puts(d)
						    end
						  }

						  File.open("pfizer_entities.txt", 'w'){|f|
						    entities_arr.each do |e|
						      e[:services].each do |s|
						        f.puts("#{e[:name]}\t#{e[:page]}\t#{e[:city]}\t#{e[:state]}\t#{s[0]}\t#{s[1]}\t#{s[2]}")
						      end  
						    end
						  }


						  File.open("pfizer_entity_totals.txt", 'w'){|f|
						    totals_arr.uniq.each do |d|
						        f.puts(d)
						    end
						  }
						end

					</pre>
</p></div>
<div class='sec'>
<p><strong>process_doctor</strong> is what we run after we&#8217;ve compiled the list of doctor names that show up on the Pfizer list. Each doctor has his/her own page with detailed spending. The data rows are roughly in the same format as the main list, so we reuse <strong>process_row</strong> again</p>
<p>.</p>
<pre name="code" class="ruby">

						def process_doctor(r, time='')
						  begin
						    url = "#{DOC_QUERY_URL}#{r}"
						    page = Nokogiri::HTML(open("#{url}"))
						  rescue
							   puts_error "Oops, had a problem getting the #{r}-entry: #{[e.to_str, e.backtrace.map{|b| "\n\t#{b}"}].join("\n")}"
						  end

						  rows = page.css('#hcpPayments tbody tr')
						  entities_arr = []
						  current_entity=nil

						   rows.each do |row|  	   
						     current_entity= process_row(row, '', current_entity, {:entities=>entities_arr})
						   end


						   name = r.split('+')
						   puts_error("Should've been a last name at #{r}") if !name[0].match(/,$/)
						   name = "#{name[0].gsub(/,$/, '')}\t#{name[1..-1].join(' ')}"

						   vals=[]
						   entities_arr.each do |e| 
						     e[:services].each do |s|
						       vals.push("#{name}\t#{e[:name]}\t#{e[:page]}\t#{e[:city]}\t#{e[:state]}\t#{s[0]}\t#{s[1]}\t#{s[2]}\t#{url}\t#{time}")
						    end
						   end

						  vals.each{|val| File.open("pfizer_doctor_details.txt", "a"){ |f| 
						    f.puts val
						  }}

						  puts vals
						  return vals
						end


					</pre>
</p></div>
<div class='sec'>
<p><strong>process_doctor_pages</strong> is just a function that calls <strong>process_doctor</strong> for each name in the <strong>pfizer_doctors.txt</strong> we previously gathered</p>
<p>The final result is pfizer_doctor_details.txt, which contains a line for every payment to every doctor.</p>
<pre name="code" class="ruby">
						def process_doctor_pages
						  time = Time.now

						  File.open("pfizer_doctors.txt", 'r'){|f|
						     f.readlines.each do |r|
						        vals = process_doctor(r, time)
						     end 
						  }
						end		

					</pre>
</p></div>
</p></div>
<div class='sec'>
<h2><a name="results"></a><br />
					The Results</h2>
<p>				After Googling the top-Pfizer-paid-doctor on the list (<a href="http://www.pfizer.com/responsibility/working_with_hcp/payments_report.jsp?hcpdisplayName=SACKS,+GERALD+MICHAEL">Gerald Michael Sacks for ~$150K</a>), I came across the <a href='http://blog.pharmaconduct.org/'>Pharma Conduct</a> blog, which had <a href='http://blog.pharmaconduct.org/2010/04/who-were-top-5-recipients-of-money-from.html?src=PharmaConduct+20100403'>already posted partial aggregations of the list</a>, including the <a href='http://blog.pharmaconduct.org/2010/04/which-doctors-received-highest.html?src=PharmaConduct+20100405'>top 5 doctors</a>, complete with profiles and pics.</p>
<p>				As Pharma Conduct has already been on the ball, I&#8217;ll defer to its analysis. It has some good background here on how lame pharma companies have been in <a href='http://blog.pharmaconduct.org/2010/02/pharma-gets-failing-grades-for-initial.html'>past releases of data</a>. Overall, Pharma Conduct is <a href='http://blog.pharmaconduct.org/2010/03/pfizer-releases-payments-to-physicians.html'>less-than impressed</a> with Pfizer:</p>
<blockquote><p>
				Despite reporting more information than some its peers, Pfizer&#8217;s interface is still very limited.  For one, to use the search filtering, you must know a physician&#8217;s first name and last name, as well as the state where the payment was made.  Also, the data cannot be sorted by payment amount, which is a big limitation.  Pfizer should be given credit for releasing the information and being so thorough.  However, by releasing it in a format that is not really amenable to data analysis and is more suited to simply looking up results one physician at a time, I echo John Mack&#8217;s sentiment, namely, that this data is translucent, but not transparent.	</p></blockquote></div>
<p>The post <a rel="nofollow" href="https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/">Coding for Journalists 104: Pfizer&#8217;s Doctor Payments; Making a Better List</a> appeared first on <a rel="nofollow" href="https://danwin.com">danwin.com</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://danwin.com/2010/04/pfizer-web-scraping-for-journalists-part-4-pfizers-doctor-payments/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>
