Author Archives: Dan Nguyen

Google’s search has been dumbed down for the novices and solipsistic

In response to Google’s latest plan to combine all your usage data on all of its platforms (GMail, Youtube, etc.) into one tidy user-and-advertiser-friendly package, I’m mostly sitting on the fence. This is because I’ve always assumed everything I type into Google Search will inextricably be linked to my personal GMail account…so I try not to search for anything job/life-sensitive in the same browser that I use GMail for.

But even before this policy, Google’s vanilla search (not the one inside Google+) has noticeably gotten too personalized. Not in a creepy sense, but in a you’re-too-dumb-to-figure-out-an-address bar way. And this is not a good feature for us non-novice Internet users.

For example, I’ve been in a admittedly-petty, losing competition with the younger, better-muscled Dan Nguyen for the top of Google’s search results. My identity (this blog, danwin.com) has always come in second-place or lower…unless I perform a search for my name while logged into my Google/GMail account:

Me on Google Search. I am logged in on the left browser, logged out on the right.

The problem isn’t that my blog shows up first for my little search universe. It’s that my Google+ profile is on top, pushing all the other search results below the fold.

This seems really un-useful to me. The link to my own Google+ profile already occupies the top-left corner of my browser every time I visit a Google-owned site. I don’t need another prominent link to it. But I’ll give Google the benefit of the doubt here; they’re making the reasonable guess that someone who is searching for their own name is just looking for their own stuff…though conveniently, Google thinks the most important stuff about the searcher happens to be the searcher’s Google+ profile.

So here’s a more general example. I do a lot of photography and am always interested in what other people are doing. So here’s a search for “times square photos” in normal search (image search seems to behave the same way logged in or out):

'times square photos' on Google Search. I am logged in on the left browser, logged out on the right.

I generally love how Google automatically includes multimedia when relevant; for example, I rarely go to Google Maps now because typing in an address in the general search box, like “50 broadway” will bring up a nice local map. But in the case of “times square photos,” Google automatically assumes that I’m most interested in my own Times Square photos.

I may be a little solipsistic, but this is going overboard. And it seems counter-productive. If I’m the type of user to continually look up different kind of photos and all I see right away are my own photos, my search universe is going to be slightly duller.

Wasn’t the original assumption of search was that the user is looking for something he/she doesn’t currently know? Like, the hours of my favorite bookstore. Doing that search pulls up a helpful sidebox, with the hours, next to the search results:

The Strand's opening hours

This is fantastic. And I do appreciate Google catering to my caveman of the question, especially when I’m on a mobile device.

But in the case of my example photo and name search, Google has gone a step too far in dumbing things down.

My hypothesis is that they are catering to the legion of users who get to yahoo.com by going to Google and typing in “Yahoo.” I imagine Google’s massive analytics system has told them that this is how many users get to GMail, as opposed to typing in gmail.com.

Google seems to be making this apply to every kind of search: when I type in a search query for “dan nguyen” or “times square photos”, Google checks to see if these are terms in my Google profile. If so, it pushes them to the top of the search pile because I must be one of those idiots who doesn’t realize that the Dan+ in the top left corner is how I get to my Google profile or that is too lazy to go to Flickr to look up my own Times Square photos.

The kicker is that that assumption contradicts my behavior. If I’m a user who was technical enough to figure out how to fill out my Google profile and properly link up third-party accounts…aren’t I the type of user who’s technical enough to get to my own Flickr photos by myself?

Searching for my own name is stupid, and kind of an edge case. But what if I’m working on a business site and have linked it (and/or its Google+ page) to my profile? And then I’m constantly doing searches to see how well that site is doing in SEO and SiteRank compared to similarly named/themed sites? Since I’m not in that situation, I can only guess: but will I have to use a separate browser just to get a reliable, business-savvy search?

I realize that this dumbing-down “feature” is the kind of thing that has to be auto-opt-in for its target audience. But I can think of a slightly non-intrusive way to make it manually opt-in. If what I really want are my own Times Square photos, then wait for me to prepend a “my” to the query. I’d think even the novice users could get into this habit.

Analyzing the U.S. Senate Smiles: A Ruby tutorial with the Face.com and NYT Congress APIs

U.S. Senate Smiles, ranked by Face.com face-detection algorithm

The smiles of your U.S. Senate from most smiley-est to least, according to Face.com's algorithm

Who’s got the biggest smile among our U.S. senators? Let’s find out and exercise our Ruby coding and civic skills. This article consists of a quick coding strategy overview (from the full code is at my Github). Or jump here to see the results, as sorted by Face’s algorithm.

About this tutorial

This is a Ruby coding lesson to demonstrate the basic features of Face.com’s face-detection API for a superficial use case. We’ll mash with the New York Times Congress API and data from the Sunlight Foundation.

The code comprehension is at a relatively simple level and is intended for learning programmers who are comfortable with RubyGems, hashes, loops and variables.

If you’re a non-programmer: The use case may be a bit silly here but I hope you can view it from an abstract-big-picture level and see the use of programming to: 1) Make quick work of menial work and 2) create and analyze datapoints where none existed before.

On to the lesson!

The problem with portraits

For the SOPA Opera app I built a few weeks ago, I wanted to use the Congressional mugshots to illustrate
the front page. The Sunlight Foundation provides a convenient zip file download of every sitting Congressmember’s face. The problem is that the portraits were a bit inconsistent in composition (and quality). For example, here’s a usable, classic head-and-shoulders portrait of Senator Rand Paul:

Sen. Rand Paul

But some of the portraits don’t have quite that face-to-photo ratio; Here’s Sen. Jeanne Shaheen’s portrait:

Sen. Jeanne Shaheen

It’s not a terrible Congressional portrait. It’s just out of proportion compared to Sen. Paul’s. What we need is a closeup crop of Sen. Shaheen’s face:

Sen. Jeanne Shaheen's face cropped

How do we do that for a given set of dozens (even hundreds) of portraits that doesn’t involve manually opening each image and cropping the heads in a non-carpal-tunnel-syndrome-inducing manner?

Easy face detection with Face.com’s Developer API

Face-detection is done using an algorithm that scans an image and looks for shapes proportional to the average human face and containing such inner shapes as eyes, a nose and mouth in the expected places. It’s not as if the algorithm has to have an idea of what an eye looks like exactly; two light-ish shapes about halfway down what looks like a head might be good enough.

You could write your own image-analyzer to do this, but we just want to crop faces right now. Luckily, Face.com provides a generous API that when you send it an image, it will send you back a JSON file in this format:

{
    "photos": [{
        "url": "http:\/\/face.com\/images\/ph\/12f6926d3e909b88294ceade2b668bf5.jpg",
        "pid": "F@e9a7cd9f2a52954b84ab24beace23046_1243fff1a01078f7c339ce8c1eecba44",
        "width": 200,
        "height": 250,
        "tags": [{
            "tid": "TEMP_F@e9a7cd9f2a52954b84ab24beace23046_1243fff1a01078f7c339ce8c1eecba44_46.00_52.40_0_0",
            "recognizable": true,
            "threshold": null,
            "uids": [],
            "gid": null,
            "label": "",
            "confirmed": false,
            "manual": false,
            "tagger_id": null,
            "width": 43,
            "height": 34.4,
            "center": {
                "x": 46,
                "y": 52.4
            },
            "eye_left": {
                "x": 35.66,
                "y": 44.91
            },
            "eye_right": {
                "x": 58.65,
                "y": 43.77
            },
            "mouth_left": {
                "x": 37.76,
                "y": 61.83
            },
            "mouth_center": {
                "x": 49.35,
                "y": 62.79
            },
            "mouth_right": {
                "x": 57.69,
                "y": 59.75
            },
            "nose": {
                "x": 51.58,
                "y": 56.15
            },
            "ear_left": null,
            "ear_right": null,
            "chin": null,
            "yaw": 22.37,
            "roll": -3.55,
            "pitch": -8.23,
            "attributes": {
                "glasses": {
                    "value": "false",
                    "confidence": 16
                },
                "smiling": {
                    "value": "true",
                    "confidence": 92
                },
                "face": {
                    "value": "true",
                    "confidence": 79
                },
                "gender": {
                    "value": "male",
                    "confidence": 50
                },
                "mood": {
                    "value": "happy",
                    "confidence": 75
                },
                "lips": {
                    "value": "parted",
                    "confidence": 39
                }
            }
        }]
    }],
    "status": "success",
    "usage": {
        "used": 42,
        "remaining": 4958,
        "limit": 5000,
        "reset_time_text": "Tue, 24 Jan 2012 05:23:21 +0000",
        "reset_time": 1327382601
    }
}		
	

The JSON includes an array of photos (if you sent more than one to be analyzed) and then an array of tags – one tag for each detected face. The important part for cropping purposes are the attributes dealing with height, width, and center:

		"width": 43,
      "height": 34.4,
      "center": {
          "x": 46,
          "y": 52.4
      },	
	

These numbers represent percentage values from 0-100. So the width of the face is 43% of the image’s total width. If the image is 200 pixels wide, then the face spans 86 pixels.

Using your favorite HTTP-calling library (I like the RestClient gem), you can simply ping the Face.com API’s detect feature to get these coordinates for any image you please.

Image manipulation with RMagick

So how do we do the actual cropping? By using the RMagick (a Ruby wrapper for the ImageMagick graphics library) gem, which lets us do crops with commands as simple as these:

img = Magick::Image.read("somefile.jpg")[0]

# crop a 100x100 image starting from the top left corner
img = img.crop(0,0,100,100) 

The RMagick documentation page is a great place to start. I’ve also written an image-manipulation chapter for The Bastards Book of Ruby.

The Process

The code for all of this is stored at my Github account.

I’ve divided this into two parts/scripts. You could combine it into one script but to make things easier to comprehend (and to lessen the amount of best-practices error-handling code for me to write), I divide it into a “fetch” and “process” stage.

In the fetch.rb stage, we essentially download all the remote files we need to do our work:

  • Download a zip file of images from Sunlight Labs and unzip it at the command line
  • Use NYT’s Congress API to get latest list of Senators
  • Use Face.com API to download face-coordinates as JSON files

In the process.rb stage, we use RMagick to crop the photos based from the metadata we downloaded from the NYT and Face.com. As a bonus, I’ve thrown in a script to programmatically create a crude webpage that ranks the Congressmembers’ faces by smile, glasses-wearingness, and androgenicity. How do I do this? The Face.com API handily provides these numbers in its response:

	"attributes": {
            "glasses": {
                "value": "false",
                "confidence": 16
            },
            "smiling": {
                "value": "true",
                "confidence": 92
            },
            "face": {
                "value": "true",
                "confidence": 79
            },
            "gender": {
                "value": "male",
                "confidence": 50
            },
            "mood": {
                "value": "happy",
                "confidence": 75
            },
            "lips": {
                "value": "parted",
                "confidence": 39
            }
        }
	

I’m not going to reprint the code from my Github account, you can see the scripts yourself there:

https://github.com/dannguyen/Congressmiles

First things first: sign up for API keys at the NYT and Face.com

I also use the following gems:

The Results

Here’s what you should see after you run the process.rb script (all judgments made by Face.com’s algorithm…I don’t think everyone will agree with about the quality of the smiles):


10 Biggest Smiles

Sen. Wicker (R-MS)
Sen. Wicker (R-MS) [100]
Sen. Reid (D-NV)
Sen. Reid (D-NV) [100]
Sen. Shaheen (D-NH)
Sen. Shaheen (D-NH) [99]
Sen. Hagan (D-NC)
Sen. Hagan (D-NC) [99]
Sen. Snowe (R-ME)
Sen. Snowe (R-ME) [98]
Sen. Kyl (R-AZ)
Sen. Kyl (R-AZ) [98]
Sen. Klobuchar (D-MN)
Sen. Klobuchar (D-MN) [98]
Sen. Crapo (R-ID)
Sen. Crapo (R-ID) [98]
Sen. Johanns (R-NE)
Sen. Johanns (R-NE) [98]
Sen. Hutchison (R-TX)
Sen. Hutchison (R-TX) [98]


10 Most Ambiguous Smiles

Sen. Inouye (D-HI)
Sen. Inouye (D-HI) [40]
Sen. Kohl (D-WI)
Sen. Kohl (D-WI) [43]
Sen. McCain (R-AZ)
Sen. McCain (R-AZ) [47]
Sen. Durbin (D-IL)
Sen. Durbin (D-IL) [49]
Sen. Roberts (R-KS)
Sen. Roberts (R-KS) [50]
Sen. Whitehouse (D-RI)
Sen. Whitehouse (D-RI) [52]
Sen. Hoeven (R-ND)
Sen. Hoeven (R-ND) [54]
Sen. Alexander (R-TN)
Sen. Alexander (R-TN) [54]
Sen. Shelby (R-AL)
Sen. Shelby (R-AL) [62]
Sen. Johnson (D-SD)
Sen. Johnson (D-SD) [63]

The Non-Smilers

Sen. Bingaman (D-NM)
Sen. Bingaman (D-NM) [79]
Sen. Coons (D-DE)
Sen. Coons (D-DE) [77]
Sen. Burr (R-NC)
Sen. Burr (R-NC) [72]
Sen. Hatch (R-UT)
Sen. Hatch (R-UT) [72]
Sen. Reed (D-RI)
Sen. Reed (D-RI) [71]
Sen. Paul (R-KY)
Sen. Paul (R-KY) [71]
Sen. Lieberman (I-CT)
Sen. Lieberman (I-CT) [59]
Sen. Bennet (D-CO)
Sen. Bennet (D-CO) [55]
Sen. Udall (D-NM)
Sen. Udall (D-NM) [51]
Sen. Levin (D-MI)
Sen. Levin (D-MI) [50]
Sen. Boozman (R-AR)
Sen. Boozman (R-AR) [48]
Sen. Isakson (R-GA)
Sen. Isakson (R-GA) [41]
Sen. Franken (D-MN)
Sen. Franken (D-MN) [37]


10 Most Bespectacled Senators

Sen. Franken (D-MN)
Sen. Franken (D-MN) [99]
Sen. Sanders (I-VT)
Sen. Sanders (I-VT) [98]
Sen. McConnell (R-KY)
Sen. McConnell (R-KY) [98]
Sen. Grassley (R-IA)
Sen. Grassley (R-IA) [96]
Sen. Coburn (R-OK)
Sen. Coburn (R-OK) [93]
Sen. Mikulski (D-MD)
Sen. Mikulski (D-MD) [93]
Sen. Roberts (R-KS)
Sen. Roberts (R-KS) [93]
Sen. Inouye (D-HI)
Sen. Inouye (D-HI) [91]
Sen. Akaka (D-HI)
Sen. Akaka (D-HI) [88]
Sen. Conrad (D-ND)
Sen. Conrad (D-ND) [86]


10 Most Masculine-Featured Senators

Sen. Bingaman (D-NM)
Sen. Bingaman (D-NM) [94]
Sen. Boozman (R-AR)
Sen. Boozman (R-AR) [92]
Sen. Bennet (D-CO)
Sen. Bennet (D-CO) [92]
Sen. McConnell (R-KY)
Sen. McConnell (R-KY) [91]
Sen. Nelson (D-FL)
Sen. Nelson (D-FL) [91]
Sen. Rockefeller IV (D-WV)
Sen. Rockefeller IV (D-WV) [90]
Sen. Carper (D-DE)
Sen. Carper (D-DE) [90]
Sen. Casey (D-PA)
Sen. Casey (D-PA) [90]
Sen. Blunt (R-MO)
Sen. Blunt (R-MO) [89]
Sen. Toomey (R-PA)
Sen. Toomey (R-PA) [88]


10 Most Feminine-Featured Senators

Sen. McCaskill (D-MO)
Sen. McCaskill (D-MO) [95]
Sen. Boxer (D-CA)
Sen. Boxer (D-CA) [93]
Sen. Shaheen (D-NH)
Sen. Shaheen (D-NH) [93]
Sen. Gillibrand (D-NY)
Sen. Gillibrand (D-NY) [92]
Sen. Hutchison (R-TX)
Sen. Hutchison (R-TX) [91]
Sen. Collins (R-ME)
Sen. Collins (R-ME) [90]
Sen. Stabenow (D-MI)
Sen. Stabenow (D-MI) [86]
Sen. Hagan (D-NC)
Sen. Hagan (D-NC) [81]
Sen. Ayotte (R-NH)
Sen. Ayotte (R-NH) [79]
Sen. Klobuchar (D-MN)
Sen. Klobuchar (D-MN) [79]

For the partisan data-geeks, here’s some faux analysis with averages:

Party Smiles Non-smiles Avg. Smile Confidence
D 44 7 85
R 42 5 86
I 1 1 85

There you have it, the Republicans are the smiley-est party of them all.

Further discussion

This is an exercise to show off the very cool Face.com API and to demonstrate the value of a little programming knowledge. Writing the script doesn’t take too long, though I spent more time than I liked on idiotic bugs of my own making. But this was way preferable than cropping photos by hand. And once I had the gist of things, I not only had a set of cropped files, I had the ability to whip up any kind of visualization I needed with just a minute’s more work.

And it wasn’t just face-detection that I was using, but face-detection in combination with deep data-sources like the Times’s Congress API and the Sunlight Foundation. For the SOPA Opera app, it didn’t take long at all to populate the site with legislator data and faces. (I didn’t get around to using this face-detection technique to clean up the images, but hey, I get lazy too…)

Please don’t judge the value of programming by my silly example here – having an easy-to-use service like Face.com API (mind the usage terms, of course) gives you a lot of great possibilities if you’re creative. Off the top of my head, I can think of a few:

  • As a photographer, I’ve accumulated thousands of photos but have been quite lazy in tagging them. I could conceivably use Face.com’s API to quickly find photos without faces for stock photo purposes. Or maybe a client needs to see male/female portraits. The Face.com API gives me an ad-hoc way to retrieve those without menial browsing.
  • Data on government hearing webcasts are hard to come by. I’m sure there’s a programmatic way to split up a video into thousands of frames. Want to know at which points Sen. Harry Reid shows up? Train Face.com’s API to recognize his face and set it loose on those still frames to find when he speaks.
  • Speaking of breaking up video…use the Face API to detect the eyes of someone being interviewed and use RMagick to detect when the eyes are closed (the pixels in those positions are different in color than the second before) to do that college-level psych experiment of correlating blinks-per-minute to truthiness.

Thanks for reading. This was a quick post and I’ll probably go back to clean it up. At some point, I’ll probably add this to the Bastards Book.

A Million Pageviews, Thousands of Dollars Poorer, and Still Countlessly Richer.

Snowball fight in Times Square, Manhattan, New York

Update: This post rambled longer than I intended it to and I forgot that I had meant to include some observations on what I’ve noticed about Flickr’s traffic pattern. I’ve added some grafs to the bottom of this post.

My Flickr account hit 1,000,000 pageviews this weekend. Two years ago, I bought a Pro account shortly after the above photo of some punk kid throwing a snowball at me in Times Square was posted on Flickr’s blog. Since then I set my account to share all of my photos under the Creative Commons Non-commercial license (but I’ve let anyone who asks use them for free).

My account was on track to have 500K pageviews by October (of this past year) but then this photo of pilots marching on Wall Street hit Reddit and attracted 150K views all by itself, so then a million total views seemed just around the corner :).

Net Profit

Mermaid Parade 2010, Coney Island

I was paid $120 for this photo, which was used in New York’s campaign to remind people that they can’t smoke in Coney Island (or any other public park).


So how much have I gained monetarily in these two years of paying for a Flickr Pro account?

Two publications offered a total of $135 for my work. Minus the two years of Pro fees ($25 times 2 years) and that comes to about $80. If I spent at minimum 1 minute to shoot, edit, process, and upload each of my ~3,100 photos, I made a rate of $1.50/hour for my work.

Of course, I’ve spent much more time than one minute per photo. And I’ve taken far more than 3,100 photos (I probably have 15 to 20 times as many stored on my backup drives). And of course, thousands of dollars for my photo equipment, including repairs and replacements. So:

  • + $135 from publications
  • – $50 for Flickr Pro fees
  • – $8,000 (and change) for Canon 5D Mark 2, Canon S90, lenses, repairs from constant use in the rain/snow/etc.

So doing the math…I’m several thousands of dollars in the hole.

Gains

Monetarily, my photography is a large loss for me. I’m lucky enough to have a job (and, for better or worse, no car or mortgage and few other hobbies to pay for) to subsidize it. So why do I keep doing it and, in general, giving away my work for free?

Well, there is always the promise of potential gain:

  • I made a $1,000 (mostly to cover expenses) to shoot a friend’s wedding because his fiance liked the work I posted on my Facebook account…but weddings are so much work that I’ve decided to avoid shooting them if I can help it.
  • I’ve also taken photos for my job at ProPublica, including this portrait for a story that was published in the Washington Post. I’m not employed specifically to take photos, but it’s nice to be able to do it on office time.
  • I also now have a large cache of stock photos to use for the random sites I build. For example, I used the Times Square snowball photo to illustrate a programming lesson on image manipulation and face-recognition technology.
  • Even if my photos were up to professional par, I’m not the type to declare (in person) to others, “Hey, one of my hobbies is photography. Look at these pictures I took.” Flickr/Facebook/Tumblr is a nice passive-humblebrag way to show this side passion to others. And I’ve made a few good friends and new opportunities because of the visibility of my work.

In the scheme of things, a million pageviews is not a lot for two years…A photo might get that in a few days if it’s a popular enough meme. And pageviews have only a slight correlation to actual artistic merit (neither the above snowball or pilot photos are my favorite of the series). But it’s amazing and humbling to think that – if the average visitor who stumbles on my account might look at 4 photos – something I’ve done as a hobby might have reached nearly a quarter million people (not counting the times when sites take advantage of the CC-licensing and reprint my photos).

Having any kind of audience, no matter how casual, is necessary to practice improve my art if I were to ever try to become a paid professional photographer. So that’s one important way that I’m getting something from my online publishing.

Photos are as free as the photographer wants them to be

My personal milestone coincidently comes after the posting of two highly-linked-to articles on the costs of a photo: This Photograph is Not Free by John Mueller and This Photograph is Free by Tristan Nitot. They both make good points (Mueller’s response to Nitot is nuanced and deserves to also be considered).

Mueller and Nitot aren’t necessarily at odds at each other so there’s not much for me to add. Photos are worth good money. To cater to a client, to buy the (extra) professional equipment, to spend more time in editing and post-processing (besides cropping, color-correction and contrast, I don’t do much else to my photos), to take more time to be there at an assignment – this is all most definitely worth charging for.

And that is precisely why I don’t put the effort into marketing or selling mine. The money isn’t worth taking that amount of time and energy from what I currently consider my main work and passion. However, what I’ve gotten so far from my photography – the extra incentive to explore the great city I live in, the countless friends and memories, and of course, the photos to look back on and reuse for whatever I want – the $8,000 deficit is easily covered by that. Having the option to easily share my photos to (hopefully) inspire and entertain others is icing.

One more side-benefit of using a public publishing system like Flickr: I couldn’t devise a better way to organize and browse my own work with minimal effort. And I’m often rediscovering what I considered to be throwaway photos because others find them interesting.

Here are a few other photos I’ve taken over the years that were either frequently-viewed or considered “interesting” by Flickr’s bizarre algorithm:

Jumping for joy during New York blizzard, Times Square
The Cat is the Hat
Sunset over Battery Park and Statue of Liberty
Woman in white, pilots
Pushing a Taxi - New York Blizzard Snowstorm Thundersnow Blaaaaagh
Lightning strikes the Empire State Building
Brooklyn Bridge photographer-tourist, Photo of
Atrium, Museum of Natural History
Union Square Show
Casting Couch (#NYFW Spring 2012)
Williamsburg: Beautiful dogs
New York Snow Blizzard 2011, Lone Man on the Brooklyn Bridge
Ground Zero NY celebrates news of Osama bin Laden's death
Grand Central Moncler NYFW Flash Mob Dancin
Broadway Rainstorm
Towers of Light 9/11
Manhattanhenge from a Taxi

A few more observations on Flickr pageviews: It’s hard to say if 1,000,000 page views is a lot especially considering the number of photos I have uploaded in total. Before the pilots on Wall Street photo, I averaged about 200-500 pageviews a day. After that, I put more effort into maintaining my account and regularly uploading photos. Now on a given day, if I don’t upload anything particularly interesting the account averages about 1,500 views.

Search engines bring very little traffic. So other than what (lack of) interest my photos have for the general Internet, I think my upload-and-forget mindset towards my account also limits my pageviews. I have a good friend on Flickr who gets far fewer pageviews but gets far more comments than I do. I rarely comment on my contacts’ photos and barely participate in the various groups.

I’m disconnected enough from the Flickr social scene that I only have a very vague understanding of how its Explore section works. Besides the blog, the Explore collection is the best way to get seen on Flickr. It features “interesting” photos as determined by an algorithm that, as best I can tell, is affected by some kind of in-group metric.

I’ve only had three photos make it to Explore: the snowball fight in Times Square, the lightning hitting the Empire State Building, and this one where my subway train got stuck and we had to walk out the tunnel. The pilots photo did not make it to Explore, so I’m guessing that amount of traffic (particularly if a huge portion of it comes from one link on Reddit) is not necessarily a prime factor to getting noticed by Flickr’s algorithm.

New York in 2011, the photo version

A little late on this but I posted a few photos I took in NYC this year over at my Tumblr, Eye Heart New York.

This year seemed like my most sheltered, uncreative year yet…even so, according to Flickr’s count, 3/4 of the 3,000+ photos I’ve uploaded in total took place in 2011. I guess when so much just happens next to me (basically, OccupyWallStreet camping out a few blocks away) it’s hard not to snap a few pics. I almost broke the million views mark (for the two years that I’ve been on Flickr) and one of my photos finally made it on someone’s dining room wall, so not too bad a year no matter what it felt.

Here’s a few of the photos; visit Eye Heart New York for the rest:

Pre-Hurricane Irene: Naked Cowboy

WTC, eve of 9/11/2011

Casting Call, New York Fashion Week Spring 2012

Lightning strikes the Empire State Building

OccupyWallSt, day of canceled city cleaning Sit-in, jazz hands! OccupyWallStreet goes to occupy Times Square, 10/15/2011

High Line Park, Art Installation, Section 2

New Museum slide!

See the rest at Eye Heart New York

United/Continental pilots march on Wall Street

The SOPA Debate and How It’s Affected by Congress’s Understanding of Child Porn

Rep. Lamar Smith, chairman of the House Judiciary Committee and SOPA sponsor

Rep. Lamar Smith, chairman of the House Judiciary Committee and SOPA sponsor

Update (1/22/2012): SOPA was indefinitely postponed by Rep. Lamar Smith on Friday (PIPA is likewise stalled). Rep. Smith also has another Internet rights bill on deck though: the The Protecting Children from Internet Pornographers Act of 2011, which mandates that Internet services store customer data for up to 18 months to make it easier for law enforcement to investigate them for child porn trafficking. This proposed bill is discussed in the latter half of this post, including how its level of support is similar (and different) than SOPA’s.

H.R. 1981 has made it farther than SOPA did. It made it out of the Judiciary Committee (which is chaired by Rep. Lamar Smith and also handled SOPA) with a 19-10 vote in July of last year and is placed on the Union Calendar. Compare HR.1981’s progress compared to SOPA’s). H.R. 1981 has 39 cosponsors, compared to SOPA’s original 31. Read the text of HR 1981.

One thing I’ve learned from the whole SOPA affair is how obscure our lawmaking process is even in this digital age. The SOPA Opera site I put up doesn’t do anything but display publicly available information: which legislators support/oppose SOPA and why. But it still got a strong reaction from users, possibly because they misunderstand our government’s general grasp of technology issues.

Sen. Al Franken

Sen. Al Franken is one of the co-sponsors for PROTECT-IP, the Senate's version of SOPA

The most common refrain I saw was: “I cannot believe that Rep/Senator [insert name] is for SOPA! [insert optional expletive].” In particular, “Al Franken” was a frequently invoked name because his fervent advocacy on net neutrality seemed to make the Minnesota senator, in many of his supporters’ opinions, an obvious enemy of SOPA. In fact, one emailer accused me of being out to slander Franken, even though the official record shows that Franken has spoken strongly for PROTECT-IP (the Senate version of SOPA) and even co-sponsored it.

So there’s been a fair amount of confusion as to what mindset is responsible for SOPA. Since party lines can’t be used to determine the rightness/wrongess of SOPA, fingers have been pointed at the money trail: SOPA’s proponents reportedly receive far more money from media/entertainment-affiliated donors than they do from the tech industry. The opposite trend exists for the opponents.

It’s impossible of course to know exactly what’s in the our legislators’ minds. But a key moment during the Nov. 16 House Judiciary hearing on SOPA suggest that their opinions may be rooted less in malice/greed (if you’re of the anti-SOPA persuasion) than in something far more prosaic: their level of technological comprehension.

You can watch the entire, incredibly-inconvenient-to-access webcast at the House Judiciary’s hearing page. I’ve excerpted a specific clip in which Rep. Tom Marino (R-PA) is asking Katherine Oyama (Google’s copyright lawyer) about why Google can stop child porn but not online piracy:

REP. MARINO: I want to thank Google for what it did for child pornography – getting it off the website. I was a prosecutor for 18 years and I find it commendable and I put those people away. So if you can do that with child pornography, why can you not do it [with] these rogue websites [The Pirate Bay, et al.]? Why not hire some whiz kids out of college to come in and monitor this and work for the company to take these off?

My daughter who is 16 and my son who is 12, we love to get on the Internet and we download music and we pay for it. And I get to a site and I say this is a new one, this is good, we can get some music here. And my daughter says Dad, don’t go near that one. It’s illegal, it’s free, and given the fact that you’re on Judiciary, I don’t think you should be doing that…Maybe we need to hire her [laugh]…but, why not?


OYAMA: The two problems are similar in that they’re both very serious problems they’re both things that we all should be working to fighting against. But they’re very different in how you go about combatting it. So for child porn, we are able to design a machine that is able to detect child porn. You can detect certain colors that would show up in pornography, you can detect flesh tones. You can have manual review, where someone would look at the content and they would say this is child porn and this shouldn’t appear.

We can’t do that for copyright just on our own. Because any video, any clip of content, it’s going to appear to the user to be the same thing. So you need to know from the rights holder…have you licensed it, have you authorized it, or is this infringement?”


REP. MARINO: I only have a limited amount of time here and I appreciate your answer. But we have the technology, Google has the technology, we have the brainpower in this country, we certainly can figure it out.

The subject of child pornography is so awful that it’s little wonder that no one really thinks about how it’s actually detected and stopped. As it turns out, it’s not at all complicated.

When I was a college reporter, I had the idea to drive down to the county district attorney’s office and go through all the search warrants. Search warrants become part of the public record, but district attorneys can seal them if police worry that details in an affidavit or search warrant would jeopardize an investigation. I wanted to count how many times this was done at the county DA, because some major cases had been sealed for months. And I wondered if the DA was too overzealous in keeping private what should be the people’s business.

But there were plenty of big cases among the unsealed warrants. I went to college in a small town but there was a bizarre, seemingly constant stream of students being charged with child porn possession. Either college students were becoming particularly perverse or the campus police happened to be crack cyber-sleuths in rooting out the purveyors.

I don’t know about the former, but I learned that the police were not particularly skilled at hacking, based on their notes in the search warrants. In fact, finding the suspects was comically easy because of the unique setup of our college network. Everyone in the dorms had an ethernet hookup but there was no Google, Napster or BitTorrent at the time. So one of the students built a search engine that allowed any student to search the shared files of every other student. And since Windows apparently made this file sharing a default (and at the time, 90+ percent of students’ computers were PCs), the student population had inadvertent access to a huge breadth of files, including MP3s and copied movies and even homework papers.

So to find out if anyone had child porn, the police could just log onto the search engine and type in the appropriate search terms. But the police didn’t even have to do this. Other students would stumble upon someone’s porn collection (you had the option of exploring anyone’s entire shared folder, not just files that came up on the search) and report it. The filenames were all the sickening indication needed to suspect someone of possession.

Google’s Oyama alludes to more technically sophisticated ways of detecting it, but the concept is just as simple as it was at my college: no matter how it’s found, child pornography is easy to categorize as child porn because of its visual characteristics, whether it’s the filename or the images itself. In fact, it’s not even necessary for a human to view a suspected file to know (within a high mathematical probability) that it contains the purported illegal content.

If you’ve ever used Shazam or any of the other song-recognition services, you’ve put this concept into practice. When you hold up a phone to identify a song playing over the bar’s speakers, it’s not as if your phone dials up one of Shazam’s resident music experts who then responds with her guess of the song. The Shazam app looks for certain high points (as well as their spacing, i.e. the song’s rhythm) to generate a “fingerprint” of the song, and then compares it against Shazam’s master database of song “fingerprints”.

No human actually has to “listen” to the song. This is not a new technological concept; it’s as old as, well, the fingerprint database used by law enforcement.

So what Rep. Marino essentially wants is for Google to build a Shazam-like service that doesn’t just identify a song by “listening” to it, but also determines if whoever playing that song has the legal right to do so. Thus, this anti-pirate-Shazam would have to determine from the musical signature of a song such things as whether it came from an iTunes or Amazon MP3 or a CD. And not only that, it would have to determine whether or not the MP3 or CD is a legal or illegal copy.

In a more physical sense, this is like detecting a machine that can determine from a photograph of your handbag whether it’s a cheap knockoff and whether or not you actually own that bag – as opposed to having stolen it, or having bought it from someone who did steal it.

I’m not a particularly skilled engineer but I can’t fathom how this would be done and neither can Google, apparently. But Rep. Marino and at least a few others on the House Judiciary committee have more faith in Google’s technical prowess and they don’t believe that Google is doing enough.

And frankly, I can’t blame them.

From their apparently non-technical vantage point, what they see is that Google is an amazing company who seems to have no limit in its capabilities. It can instantly scour billions of webpages. It can plot in seconds the driving route from Des Moines ot Oaxaca, Mexico. And at some point, might even make a car that drives that route all by itself.

And Google has demonstrated the power to stop evil acts, because it has effectively prevented the spread of child porn in its search engine and other networks. Child porn is a terrible evil; software/media piracy less so. It stands to reason – in a non-technical person’s thinking – that anyone who can stop a great evil must surely be able to stop a lesser evil.

And so, to continue this line of reasoning, if Google doesn’t stop a lesser evil such as illegal MP3 distribution, then it must be because it doesn’t care enough. Or, as some House members noted, Google is loathe to take action because it makes money off of sites that trade in ill-gotten intellectual property.

So you can see how one’s position on SOPA may be inspired not as much out of devotion to an industry but more from a particular (or lack thereof) understanding of the technological tradeoffs and hurdles.

Rep. Marino et. all sees this as something within the realm of technological possibility for Google’s wizards, if only they had some legal incentive. Google and other SOPA opponents see that the problem that SOPA ostensibly tackles is not one that can be solved with any amount of technological expertise. Thus, each side can be as anti-online-piracy/pro-intellectual-property as the other and yet fight fiercely over SOPA.

Smith’s anti-child porn, database-building bill

Though SOPA has taken the spotlight, there is another Internet-related bill on the House Judiciary’s agenda. It’s H.R. 1981, a.k.a The Protecting Children from Internet Pornographers Act of 2011, which proposed a mandate that Internet sites keep track of their users IP information for up to 18 months, to make it easier to investigate Internet crimes – such as downloading child pornography.

H.R. 1981 was introduced by House Judiciary Chairman Rep. Lamar Smith (R-Tex.) who is, of course, the legislator who introduced SOPA. And like SOPA, the support for H.R. 1981 is non-partisan because child pornography is neither a Republican or Democratic cause.

And also like SOPA, the opposition to H.R. 1981 is along non-partisan lines. Among the most vocal opponents to the child porn bill is the Judiciary committee’s ranking member Rep. John Conyers (D-MI). Is it because he is in the pocket of the child porn lobby? No; Conyers argues that even though child porn is bad, H.R. 1981 relies on using technology in a way that is neither practical nor ethical. From CNET:

The bill is mislabeled,” said Rep. John Conyers of Michigan, the senior Democrat on the panel. “This is not protecting children from Internet pornography. It’s creating a database for everybody in this country for a lot of other purposes.”

Rep. John Conyers (D-MI)

Rep. Conyers apparently understands that just because a law purports to fight something as evil (and, of course, politically unpopular) as child pornography doesn’t mean that the law’s actual implementation will be sound.

So when the wrong-to-be-righted is online piracy – i.e. SOPA – what is Conyers’ stance? He is one of its most vocal supporters:

The Internet has regrettably become a cash-cow for the criminals and organized crime cartels who profit from digital piracy and counterfeit products. Millions of American jobs are at stake because of these crimes.

Is it because Conyers is in the pocket of big media? Or that he hates the First Amendment? That’s not an easily apparent conclusion judging from his past votes and legislative history.

It’s of course possible that Conyers takes this particular stance on SOPA because SOPA, all things considered, happens to be a practical and fair law in the way that H.R. 1981 isn’t.

But a more cynical viewpoint is that Conyer’s technological understanding for one bill does not apply to the other. Everyone has been screwed over at some point by a massive, faceless database so it’s easy to be fearful of online databases – in fact, the less you know about computers, the more concerned you’ll be of the misuse of databases.

The technological issues underlying SOPA are arguably far more complex, though, and it’s not clear – as evidenced by Rep. Marino’s line of questioning – that Congressmembers, whether they support or oppose SOPA, have a full understanding of them.

As it stands though, SOPA had 31 cosponsors at its heyday. H.R. 1981 has 39. It will be interesting to see if this bill by Rep. Smith will face any residual backlash after what happened with SOPA.

SOPAopera.org – A hand-made list of SOPA / PROTECT-IP Congressional supporters and opponents

I’ve always been interested in exploring the various online Congressional information sources and the recent SOPA debate seemed like a good time to put some effort in it…also, I’ve always wanted to try out the excellent isotope Javascript library.

I had been passively paying attention to the debate and was surprised at how hard it was to find a list of supporters and opponents, given how much it’s dominated my (admittedly small bubblish) internet communities.

When I set out to compile the list, though, I could see why…the official government sites don’t make it easy to find or interpret the information. So SOPAopera is my game attempt at putting some basic information about it…the feedback I’ve gotten so far indicates that even constituents who have been reading a lot about SOPA/PROTECT-IP are surprised at the level and diversity of support the laws have among Congressmembers.

The Irrationality of Price Anchoring

Roulette wheel

Roulette wheel, by Conor Ogle from London, UK

tl;dr summary – people will unconsciously anchor their judgment on a random number given to them, even if that number was fabricated in front of their own eyes.

If you’ve never heard the term price anchor, you’ve undoubtedly experienced it anytime you’ve had to negotiate a purchase, whether it was for a used car, an online auction, or even cheap souvenirs at a Chinatown bazaar.

Even if you manage to haggle the seller down to half the initial stated price, don’t be too quick to congratulate yourself; the merchant might have gotten what he/she wanted despite the discount, if the asking price was over the top.

And the kicker is: even if you knew the initial asking price was too high, you still might have been fooled into overpaying.

Daniel Kahneman, a psychologist who won the Nobel Prize for Economics, conducted a famous experiment in which college students were influenced by a random number that they knew to be random:

Amos and I once rigged a wheel of fortune. It was marked from 0 to 100, but we had it built so that it would stop only at 10 or 65. We recruited students of the University of Oregon as participants in our experiment. One of us would stand in front of a small group, spin the wheel, and ask them to write down the number on which the wheel stopped, which of course was either 10 or 65.

We then asked them two questions:

  • Is the percentage of African nations among UN members larger or smaller than the number you just wrote?
  • What is your best guess of the percentage of African nations in the UN?

The spin of a wheel of fortune – even one that is not rigged – cannot possibly yield useful information about anything, and the participants in our experiment should simply have ignored it. But they did not ignore it. The average estimates of those who saw 10 and 65 were 25% and 45%, respectively.

Kahneman, Daniel (2011-10-25). Thinking, Fast and Slow (p. 120). Macmillan.

In his recently published book (well worth purchasing, BTW), Thinking, Fast and Slow, Kahneman reflects that “we were not the first to observe the effects of anchors, but our experiment was the first demonstration of its absurdity: people’s judgments were influenced by an obviously uninformative number.”

This lesson is something to keep in mind as a content-creator, particularly in a digital age in which it’s very easy to distribute your work for free. The fact that so many developers sell their apps for free (or, “freemium”) creates a price anchor that makes even 99 cents seem too much for a high-quality game (see Tom Oatmeal’s excellent comic on this phenomenon).

Kahneman’s experiment suggests that some people might be fooled into paying higher than they might, but in today’s inter-connected world where price comparisons are instant, it’s harder to get away with irrationally high prices. The more pressing concern is that pricing too low can cause customers to assign an irrationally low value to your product.

Rather than get offended when people demand that your product be free, see it as a quirk in human psychology – people can’t help but be fooled by even random numbers – and adjust accordingly. You may just have to adjust your product or its pitch to emphasize its uniqueness before you have the freedom to set a palatable price anchor.

Louis C.K. thanks his fans for buying his $5 Beacon Theater Show

Louis C.K. just sent an email to everyone who bought his recent Beacon Theater performance through his experimental sale. Here it is, just incase you haven’t forked over the measly $5 for a great hour of entertainment, and a significant milestone in online distribution:

Louis C.K. info@louisck.com via cmail4.com 11:44 AM (2 hours ago)

Hi. This is LOuie. It seriously is me. Im even going to leave the O stuipdly capatalized because who would pay an intern to do that?? Okay so you bought the thing with my fat face on it and you clicked the button that said i could email you. And i know that now you are thinking “aw shit. Why’d i let this guy into my life this way?”. Well dont worry. Because i really swear it that i wont bug you. I will not abuse this privalage of having your email. You wont hear from me again… Probably, unless i have something new to offer you. The reason i’m writing now, in the back of a car taking me to the Tonight Show set, is to let you know that as of now there is some new and cool stuff on my site, related to Live at the Beacon Theater. Theres a thing where you can download and print a dvd box cover and label so you can burn and make your own dvd of the video. And theres a new option where you can gift the special to as many people as you want (for 5 bucks each) and they’ll get a nice gifty email from you with a link to the video.

Also, some of you may know, i recently made a statement (that sounds so dumb. Like i’m the president or something) about how the video has been doing online. Im pasting it in here below in case you missed it.

Lastly I’m planning to put some more outtakes of the show on youtube and i think i will put one on the site that is only available for free to you folks on this list, who bought the thing and opted in. But dont hold me to that because really i just thought of it and typed it.

Okay well please have a happy rest of the year and more happy years after that. And please even have been happy in your past. What?

Thanks again for giving me 5 dollars. I bought 3 cokes with it.

Regards. Sincerely, Actually,

Louis

=========================== People of Earth (minus the ones who don’t give a shit about this): it’s been amazing to conduct this experiment with you. The experiment was: if I put out a brand new standup special at a drastically low price ($5) and make it as easy as possible to buy, download and enjoy, free of any restrictions, will everyone just go and steal it? Will they pay for it? And how much money can be made by an individual in this manner?

It’s been 4 days. A lot of people are asking me how it’s going. I’ve been hesitant to share the actual figures, because there’s power in exclusive ownership of information. What I didn’t expect when I started this was that people would not only take part in this experiment, they would be invested in it and it would be important to them. It’s been amazing to see people in large numbers advocating this idea. So I think it’s only fair that you get to know the results. Also, it’s just really cool and fun and I’m dying to tell everybody. I told my Mom, I told three friends, and that wasn’t nearly enough. So here it is.

First of all, this was a premium video production, shot with six cameras over two performances at the Beacon Theater, which is a high-priced elite Manhattan venue. I directed this video myself and the production of the video cost around $170,000. (This was largely paid for by the tickets bought by the audiences at both shows). The material in the video was developed over months on the road and has never been seen on my show (LOUIE) or on any other special. The risks were thus: every new generation of material I create is my income, it’s like a farmer’s annual crop. The time and effort on my part was far more than if I’d done it with a big company. If I’d done it with a big company, I would have a guarantee of a sizable fee, as opposed to this way, where I’m actually investing my own money.

The development of the website, which needed to be a very robust, reliable and carefully constructed website, was around $32,000. We worked for a number of weeks poring over the site to make sure every detail would give buyers a simple, optimal and humane experience for buying the video. I edited the video around the clock for the weeks between the show and the launch.

The show went on sale at noon on Saturday, December 10th. 12 hours later, we had over 50,000 purchases and had earned $250,000, breaking even on the cost of production and website. As of Today, we’ve sold over 110,000 copies for a total of over $500,000. Minus some money for PayPal charges etc, I have a profit around $200,000 (after taxes $75.58). This is less than I would have been paid by a large company to simply perform the show and let them sell it to you, but they would have charged you about $20 for the video. They would have given you an encrypted and regionally restricted video of limited value, and they would have owned your private information for their own use. They would have withheld international availability indefinitely. This way, you only paid $5, you can use the video any way you want, and you can watch it in Dublin, whatever the city is in Belgium, or Dubai. I got paid nice, and I still own the video (as do you). You never have to join anything, and you never have to hear from us again.

I really hope people keep buying it a lot, so I can have shitloads of money, but at this point I think we can safely say that the experiment really worked. If anybody stole it, it wasn’t many of you. Pretty much everybody bought it. And so now we all get to know that about people and stuff. I’m really glad I put this out here this way and I’ll certainly do it again. If the trend continues with sales on this video, my goal is that i can reach the point where when I sell anything, be it videos, CDs or tickets to my tours, I’ll do it here and I’ll continue to follow the model of keeping my price as far down as possible, not overmarketing to you, keeping as few people between you and me as possible in the transaction. (Of course i reserve the right to go back on all of this and sign a massive deal with a company that pays me fat coin and charges you straight up the ass.). (This is you: yes Louie. And we’ll all enjoy torrenting that content. You fat sweaty dolt).

I probably sound kind of crazy right now. It’s been a really fun and intense few days. This video was paid for by people who bought tickets, and then bought by people who wanted to see that same show. I got to do exactly the show I wanted, and exactly the show you wanted.

I also got an education. And everything i learned are things i was happy to learn. I learned that people are interested in what happens and shit (i didn’t go to college)

I learned that money can be a lot of things. It can be something that is hoarded, fought over, protected, stolen and withheld. Or it can be like an energy, fueled by the desire, will, creative interest, need to laugh, of large groups of people. And it can be shuffled and pushed around and pooled together to fuel a common interest, jokes about garbage, penises and parenthood.

I want to thank Blair Breard who produced this video and produces my series LOUIE, and I want to thank Caspar and Giles at Version Industries, who created the website.

I hope with all of my heart that I stay funny. Otherwise this all goes to hell. Please have a safe and happy holiday, and thank you again for all this crazy shit.

Sincerely, Louis C.K.

To me, there’s no comedian out there right now who is as inspiring as Louis. He shoots his own TV show and edits it on his own laptop. When a fan uploads a bootleg recording of him on a torrent site, he emails the fan personally to ask to take it down; not because he thinks it’s theft, but because he doesn’t want he sees as his rough draft work to be floating around. And then to try to one-up the torrenters, he experiments with distributing his own show.

I don’t begrudge people who have made and positioned their careers around the traditional media model (including newspapers and music, of course). I grew up with computers and it still took me by surprise. I don’t know what Louis’s background, but judging by the caliber of his comedy, he seems to have been too busy paying his dues in the comedy club circuit to have become a formally trained multi-media artist/videographer/techie/marketer. And yet even though he’s in his mid-40s and could likely coast on his career, he’s undertaken the kind of sacrifice and experimentation that you rarely see among artists and creators who have a much greater need to figure out the digital transition.

And he’s pretty funny. And, if it’s not already unfair that he’s got so much going for him, he’s incredibly eloquent and insightful even when doing serious interviews. Check out this interview with him on Fresh Air:

So I got a lot of emails from people saying, ‘Why can’t you just keep it clean? Because I am now shut off from your act by the horrible things you said, and that’s such a shame.’ And I would not usually respond to them because I don’t return emails, but in my head and to a few of them I said, ‘Well, you’re the one putting the limit. Not me. I’m saying a bunch of stuff, and you’re the one saying I should only say one facet of it.’ That’s a limit. But at the same time, when these people would write to me I’d kind of like them. Whenever I’ve encountered a Christian saying, ‘Why don’t you stop talking like that so I can hear you?’ I think, ‘Well you’re the one putting the earmuffs on, but I wish you could hear me because I like you.’


“There are things in the show I’m able to show [my daughters]. There’s an episode about Halloween that I showed them parts of. There’s a lot of things they’re able to see. They’re just fun stories. And my daughters, I think they really enjoy what I do. There are certainly some things they can’t see in Louie because … the language is grown-up and is for adults. They know that. They get it. I’ve played them some George Carlin clips that have cursing in them. I explain it to my kids that some people get uncomfortable or their feelings get hurt by certain words, so you want to respect that in regular life, but there is a reason for these words. They’re not just ‘bad.’ So I’m bringing them along. They’ll see this stuff when it’s appropriate to see it.”

I was one of the lucky people who got $10 tickets to his show in Brooklyn a few months back. So many people rushed to get tickets online that the ticketseller’s site crashed. Later in the day, Louis announced that he would stick around for a third show. What a class act.

To find insights: ask the cage cleaners, not the veterinarians

A chair with its front worn out.

A chair with its front worn out: image cropped from Sapolsky's book, Why Zebras Don't Get Ulcers, Third Edition

Sometimes the great insights about how things really work can come from the people who are thought to be too far down the ladder to possibly understand the big picture. Robert M. Sapolsky, an author and professor of neurology at Stanford, coined a proverb for this phenomenon:

“If you want to know if the elephant at the zoo has a stomachache, don’t ask the veterinarian, ask the cage cleaner.”

In other words, the low-level employees who fix what’s broken may often be in the best position to first notice when, why, and how something broke. Or, as Sapolsky puts it: “People who clean up messes become attuned to circumstances that change the amount of mess there is.”

This is not just a cute aphorism to remind you to tip your cleaning lady, but a lesson that has manifested itself at least a few times in scientific history. Sapolsky retells the confession of Dr. Meyer Friedman, who – with his partner R. H. Rosenman – is credited with discovering the link between Type-A personalities and heart disease. This revelation sparked off the field of research into how physical and mental health are intertwined.

According to Friedman, though, the breakthrough observation first came from his upholsterer:

It was the mid-1950s, Friedman and Rosenman had their successful cardiology practice, and they were having an unexpected problem. They were spending a fortune having to reupholster the chairs in their waiting rooms. This is not the sort of issue that would demand a cardiologist’s attention. Nonetheless, there seemed to be no end of chairs that had to be fixed. One day, a new upholsterer came in to see to the problem, took one look at the chairs, and discovered the Type A-cardiovascular disease link.

“What the hell is wrong with your patients? People don’t wear out chairs this way.” It was only the front-most few inches of the seat cushion and of the padded armrests that were torn to shreds, as if some very short beavers spent each night in the office craning their necks to savage the fronts of the chairs. The patients in the waiting rooms all habitually sat on the edges of their seats, fidgeting, clawing away at the armrests.

The rest should have been history: up-swelling of music as the upholsterer is seized by the arms and held in a penetrating gaze—“Good heavens, man, do you realize what you’ve just said?” Hurried conferences between the upholsterer and other cardiologists. Frenzied sleepless nights as teams of idealistic young upholsterers spread across the land, carrying the news of their discovery back to Upholstery/Cardiology Headquarters—“Nope, you don’t see that wear pattern in the waiting-room chairs of the urologists, or the neurologists, or the oncologists, or the podiatrists, just the cardiologists. There’s something different about people who wind up with heart disease”—and the field of Type-A therapy takes off.

Sapolsky, Robert M. Why Zebras Don’t Get Ulcers, Third Edition: The Acclaimed Guide to Stress, Stress-Related Diseases, and Coping (p. 408). Macmillan.

Unfortunately for the upholsterer, Dr. Friedman was too busy to listen to him. Only years later, when Friedman and Rosenman conducted studies of their patients did Friedman finally grasp the importance of what his upholsterer had discovered (although not the upholsterer’s name).

It’s hard to know how many other “Eureka, the janitor is right!” moments that science and technology are beholden to. Unlike the case of Sir Alexander Fleming, it’s one thing to say how, out of genius and keen observation, you made lemonade out of lemons (in Fleming’s case, penicillin after forgetting to put away his staph samples during summer vacation). It’s a little more deflating to admit that the maintenance worker beat you to the discovery.

Cleaning up the mess in online advertising

Eric Veach, who author Steven Levy describes as the “Google engineer who created the most successful ad system in history,” was no mere low-level grunt when he designed the implementation for AdWords (although apparently, this accomplishment isn’t enough to inspire someone to write a Wikipedia entry on him). He came to Google in 2000 after working on Pixar’s movie-rendering software and was assigned to the ad department, which Veach describes to Levy as “a backwater of the company.” (Levy, Steven (2011-04-12). In The Plex p. 83.)

At that time, Google ads were still sold by actual people and Google had declined an offer to merge with Overture Services, then the leader in auction-bid online advertising and later acquired by Yahoo. So Veach was part of the team to create Google’s own auction system. Veach had observed – and strongly disliked – how Overture’s system invited a kind of “cat-and-mouse game” between bidders: If the winning bidder bid $100, it would have to pay $100, even if the next bidder had only bid $50. The optimal strategy, of course, was to bid the lowest increment possible to edge out the other bidders, which led to the use of automated software to game the system.

This was not an ideal competitive situation. And, according to other reports, Veach and Salar Kamangar were dismayed at its impact on server load, since advertisers would frequently log in to make these minor bid modifications. To clean up this mess, Veach designed a different model. As Levy describes it:

The winner of the auction wouldn’t be charged for the amount of his victorious bid but instead would pay a penny more than the runner-up bid. (Example: If Joe bids 10 cents a click, Alice bids 6, and Sue bids 2, Joe wins the top slot and pays 7. Alice is in the next slot, paying 3.) It was incredibly liberating because it eliminated the fear of “winner’s remorse,” where the high bidder in an auction feels suckered by paying too much.

Levy, Steven (2011-04-12). In The Plex (p. 90). Simon & Schuster, Inc.. Kindle Edition.

Veach’s model was so counter-intuitive that he had to constantly defend it, even to Larry Page and Sergey Brin. He was vindicated when AdWords (which had several other technical innovations behind it) went on to help Google to its first profitable year in 2002.

As Levy writes, Veach’s method had already been vouched for, in high places:

Part of her [Sheryl Sandberg, former chief of staff to the secretary of the treasury in the Clinton administration] job at Google was explaining its innovative auction. She kept staring at the formula, wondering why it seemed so familiar. So she called her former boss, Treasury Secretary Larry Summers. “Larry, we have this problem,” she said. “I’m trying to explain how our auction works—it seems familiar to me.” She described it to him. “Oh yeah,” said Summers. “That’s a Vickery second-bid auction!” He explained that not only was this a technique used by the government to sell Federal Reserve bonds but the economist who had devised it had won a Nobel Prize.

Veach had reinvented it from scratch.

Levy, Steven (2011-04-12). In The Plex (p. 90). Simon & Schuster, Inc.

While Veach was not some low-level “cage cleaner,” his job did involve cleaning up a seedy part in the world of online advertising. Levy’s book may gloss over whatever mathematical proofs and logic Veach used in deciding on the second-bid auction method. But given that Veach’s strategy was initially doubted by Google’s founders, who themselves were not shy to contrarian thinking, Veach is a good example of how those who get their hands dirty may end up with a clearer “big picture” of how a system truly works.

The pre-mortem

The case of Friedman’s upholsterer is by definition, a rare occurrence. A more common – and tragically so – kind of cage-cleaner is the whistleblower.

Dr. Richard Feynman, the Nobel Prize-winning quantum physicist, might be best remembered by the general public for his role in the investigation of the Challenger disaster. During a televised press conference, Feynman used a cup of ice water to demonstrate how NASA’s managers apparently overlooked a simple tenet of physics that helped lead to the shuttle explosion:

But as Dr. Feynman tells in the second half of his book, What Do You Care What Other People Think? (which should be required reading for all scientists, engineers, and journalists), he may have gotten credit for the revelation of the shuttle’s O-ring, but he did not come up with it himself. One of his fellow commission members pointed out the possible defect to him. And that commission member had been told about it by an anonymous astronaut, who said that NASA had data demonstrating the O-ring’s problem but apparently hadn’t used it.

The truism that “Hindsight is 20/20″ is sometimes used to excuse the most boneheaded of screw-ups. It’s not that it took the Challenger to blow up before we could discover that the O-rings, like most other solids in existence, lose resilience when it’s cold out. High-level managers had the information available to them; they just chose to overlook the complaints and observations of low-level engineers. The good-natured Feynman was so alarmed by NASA’s “fantastic faith in the machinery” that he threatened to quit the investigation unless they included his criticisms in the final report.

But anyone who works in an organization of more than 2 people can attest to how fear of being ostracized can make it difficult to stop a project or plan in motion. During a Let’s-Go!-type of team meeting, there’s nothing more of a buzz-kill than someone in the back constantly complaining how the i’s aren’t all properly dotted.

In his book “Thinking, Fast and Slow,” psychologist Daniel Kahneman (a Nobel Prize winner himself) describes how psychologist Gary Klein devised a sort of “partial remedy” that I had never heard of before – but that I wish were more commonplace: the “premortem“:

The procedure is simple: when the organization has almost come to an important decision but has not formally committed itself, Klein proposes gathering for a brief session a group of individuals who are knowledgeable about the decision. The premise of the session is a short speech: “Imagine that we are a year into the future. We implemented the plan as it now exists. The outcome was a disaster. Please take 5 to 10 minutes to write a brief history of that disaster.”

The premortem has two main advantages: it overcomes the groupthink that affects many teams once a decision appears to have been made, and it unleashes the imagination of knowledgeable individuals in a much-needed direction. As a team converges on a decision—and especially when the leader tips her hand—public doubts about the wisdom of the planned move are gradually suppressed and eventually come to be treated as evidence of flawed loyalty to the team and its leaders. The suppression of doubt contributes to overconfidence in a group where only supporters of the decision have a voice. The main virtue of the premortem is that it legitimizes doubts. Furthermore, it encourages even supporters of the decision to search for possible threats that they had not considered earlier.

Kahneman, Daniel (2011-10-25). Thinking, Fast and Slow (pp. 264-266). Macmillan.

Just like the egotistical scientist who is loathe to give credit to a janitor for first making a groundbreaking observation, no project manager likes admitting that disaster was averted only through the foresight of an underling. The problem is such that in bureaucracies, managers may actively avoid receiving such momentum-killing feedback, and underlings who wish to keep their jobs may lean toward keeping quiet rather than being pegged as the negative nancy.

The pre-mortem, which (ideally) rewards contributors for thinking destructively, may be one of the few ways to recognize the cage-cleaners who deal with the muck. Or at least, the people who take the time to listen to them.