Monthly Archives: January 2012

Woody Allen: Every step is part of the writing process

Woody Allen (2006), photo by Colin Swan

One of the best books I’ve picked up recently is Eric Lax’s Conversations with Woody Allen: His Films, the Movies, and Moviemaking, which is basically a 400+ page interview, spanning decades, between the author and Allen. I’m a fair-weather fan myself, I’ve only seen a few of his movies but I’ve always admired his relentless pursuit for his art, even when some of it seems to just be screwball comedy.

The book is divided into 8 parts for different facets of Allen’s work, including “Writing It”, “Shooting, Sets, Locations” and “Directing.” The following excerpt comes from the “Editing” part and in it, Allen talks about how he sees every step of filmmaking as part of the writing process (emphasis added):

[Eric Lax]: You’re involved with the details of every step of a film, and I’ve noticed that you do not delegate any part of its creation, even assembling a first cut from takes you’ve already selected.

[Woody Allen]: To me the movie is a handmade product. I was watching a documentary on editing on television the other day and many wonderful filmmakers were on and wonderful editors and everyone was talking briefly about how they edit. Years ago, they would turn it over to an editor. Or there are people I know who finish shooting and go away for a vacation and let the editor do a draft; then they come back and they check it out and do their changes.

I can’t do that. It would be unthinkable for me not to be in on every inch of movie – and this is not out of any sort of ego or sense of having to control; I just can’t imagine it any other way. How could I not be in on the editing, on the scoring, because I feel that the whole project is one big writing project?

You may not be writing with a typewriter once you get past the script phase, but when you’re picking locations and casting and on the set, you’re really writing. You’re writing with film, and you’re writing with film when you edit it together and you put some music in. This is all part of the writing process for me.

Lax, Eric (2009-08-12). Conversations with Woody Allen (p. 284). Random House, Inc.. Kindle Edition.

I feel the exact same way about any kind of modern storytelling. Whether it’s done as a photo essay, movie, or news application/website, each step of the process can profoundly affect and be affected by your editorial vision. Back in the day of traditional journalism, it’s possible that you could have one person do just the interviewing and research and then one person to put it as story form. But the feedback in that process – an unexpectedly emotional interview that alters what you previously thought the story arc should be – would almost be entirely lost.

Google’s search has been dumbed down for the novices and solipsistic

In response to Google’s latest plan to combine all your usage data on all of its platforms (GMail, Youtube, etc.) into one tidy user-and-advertiser-friendly package, I’m mostly sitting on the fence. This is because I’ve always assumed everything I type into Google Search will inextricably be linked to my personal GMail account…so I try not to search for anything job/life-sensitive in the same browser that I use GMail for.

But even before this policy, Google’s vanilla search (not the one inside Google+) has noticeably gotten too personalized. Not in a creepy sense, but in a you’re-too-dumb-to-figure-out-an-address bar way. And this is not a good feature for us non-novice Internet users.

For example, I’ve been in a admittedly-petty, losing competition with the younger, better-muscled Dan Nguyen for the top of Google’s search results. My identity (this blog, has always come in second-place or lower…unless I perform a search for my name while logged into my Google/GMail account:

Me on Google Search. I am logged in on the left browser, logged out on the right.

The problem isn’t that my blog shows up first for my little search universe. It’s that my Google+ profile is on top, pushing all the other search results below the fold.

This seems really un-useful to me. The link to my own Google+ profile already occupies the top-left corner of my browser every time I visit a Google-owned site. I don’t need another prominent link to it. But I’ll give Google the benefit of the doubt here; they’re making the reasonable guess that someone who is searching for their own name is just looking for their own stuff…though conveniently, Google thinks the most important stuff about the searcher happens to be the searcher’s Google+ profile.

So here’s a more general example. I do a lot of photography and am always interested in what other people are doing. So here’s a search for “times square photos” in normal search (image search seems to behave the same way logged in or out):

'times square photos' on Google Search. I am logged in on the left browser, logged out on the right.

I generally love how Google automatically includes multimedia when relevant; for example, I rarely go to Google Maps now because typing in an address in the general search box, like “50 broadway” will bring up a nice local map. But in the case of “times square photos,” Google automatically assumes that I’m most interested in my own Times Square photos.

I may be a little solipsistic, but this is going overboard. And it seems counter-productive. If I’m the type of user to continually look up different kind of photos and all I see right away are my own photos, my search universe is going to be slightly duller.

Wasn’t the original assumption of search was that the user is looking for something he/she doesn’t currently know? Like, the hours of my favorite bookstore. Doing that search pulls up a helpful sidebox, with the hours, next to the search results:

The Strand's opening hours

This is fantastic. And I do appreciate Google catering to my caveman of the question, especially when I’m on a mobile device.

But in the case of my example photo and name search, Google has gone a step too far in dumbing things down.

My hypothesis is that they are catering to the legion of users who get to by going to Google and typing in “Yahoo.” I imagine Google’s massive analytics system has told them that this is how many users get to GMail, as opposed to typing in

Google seems to be making this apply to every kind of search: when I type in a search query for “dan nguyen” or “times square photos”, Google checks to see if these are terms in my Google profile. If so, it pushes them to the top of the search pile because I must be one of those idiots who doesn’t realize that the Dan+ in the top left corner is how I get to my Google profile or that is too lazy to go to Flickr to look up my own Times Square photos.

The kicker is that that assumption contradicts my behavior. If I’m a user who was technical enough to figure out how to fill out my Google profile and properly link up third-party accounts…aren’t I the type of user who’s technical enough to get to my own Flickr photos by myself?

Searching for my own name is stupid, and kind of an edge case. But what if I’m working on a business site and have linked it (and/or its Google+ page) to my profile? And then I’m constantly doing searches to see how well that site is doing in SEO and SiteRank compared to similarly named/themed sites? Since I’m not in that situation, I can only guess: but will I have to use a separate browser just to get a reliable, business-savvy search?

I realize that this dumbing-down “feature” is the kind of thing that has to be auto-opt-in for its target audience. But I can think of a slightly non-intrusive way to make it manually opt-in. If what I really want are my own Times Square photos, then wait for me to prepend a “my” to the query. I’d think even the novice users could get into this habit.

Analyzing the U.S. Senate Smiles: A Ruby tutorial with the and NYT Congress APIs

U.S. Senate Smiles, ranked by face-detection algorithm

The smiles of your U.S. Senate from most smiley-est to least, according to's algorithm

Who’s got the biggest smile among our U.S. senators? Let’s find out and exercise our Ruby coding and civic skills. This article consists of a quick coding strategy overview (from the full code is at my Github). Or jump here to see the results, as sorted by Face’s algorithm.

About this tutorial

This is a Ruby coding lesson to demonstrate the basic features of’s face-detection API for a superficial use case. We’ll mash with the New York Times Congress API and data from the Sunlight Foundation.

The code comprehension is at a relatively simple level and is intended for learning programmers who are comfortable with RubyGems, hashes, loops and variables.

If you’re a non-programmer: The use case may be a bit silly here but I hope you can view it from an abstract-big-picture level and see the use of programming to: 1) Make quick work of menial work and 2) create and analyze datapoints where none existed before.

On to the lesson!

The problem with portraits

For the SOPA Opera app I built a few weeks ago, I wanted to use the Congressional mugshots to illustrate
the front page. The Sunlight Foundation provides a convenient zip file download of every sitting Congressmember’s face. The problem is that the portraits were a bit inconsistent in composition (and quality). For example, here’s a usable, classic head-and-shoulders portrait of Senator Rand Paul:

Sen. Rand Paul

But some of the portraits don’t have quite that face-to-photo ratio; Here’s Sen. Jeanne Shaheen’s portrait:

Sen. Jeanne Shaheen

It’s not a terrible Congressional portrait. It’s just out of proportion compared to Sen. Paul’s. What we need is a closeup crop of Sen. Shaheen’s face:

Sen. Jeanne Shaheen's face cropped

How do we do that for a given set of dozens (even hundreds) of portraits that doesn’t involve manually opening each image and cropping the heads in a non-carpal-tunnel-syndrome-inducing manner?

Easy face detection with’s Developer API

Face-detection is done using an algorithm that scans an image and looks for shapes proportional to the average human face and containing such inner shapes as eyes, a nose and mouth in the expected places. It’s not as if the algorithm has to have an idea of what an eye looks like exactly; two light-ish shapes about halfway down what looks like a head might be good enough.

You could write your own image-analyzer to do this, but we just want to crop faces right now. Luckily, provides a generous API that when you send it an image, it will send you back a JSON file in this format:

    "photos": [{
        "url": "http:\/\/\/images\/ph\/12f6926d3e909b88294ceade2b668bf5.jpg",
        "pid": "F@e9a7cd9f2a52954b84ab24beace23046_1243fff1a01078f7c339ce8c1eecba44",
        "width": 200,
        "height": 250,
        "tags": [{
            "tid": "TEMP_F@e9a7cd9f2a52954b84ab24beace23046_1243fff1a01078f7c339ce8c1eecba44_46.00_52.40_0_0",
            "recognizable": true,
            "threshold": null,
            "uids": [],
            "gid": null,
            "label": "",
            "confirmed": false,
            "manual": false,
            "tagger_id": null,
            "width": 43,
            "height": 34.4,
            "center": {
                "x": 46,
                "y": 52.4
            "eye_left": {
                "x": 35.66,
                "y": 44.91
            "eye_right": {
                "x": 58.65,
                "y": 43.77
            "mouth_left": {
                "x": 37.76,
                "y": 61.83
            "mouth_center": {
                "x": 49.35,
                "y": 62.79
            "mouth_right": {
                "x": 57.69,
                "y": 59.75
            "nose": {
                "x": 51.58,
                "y": 56.15
            "ear_left": null,
            "ear_right": null,
            "chin": null,
            "yaw": 22.37,
            "roll": -3.55,
            "pitch": -8.23,
            "attributes": {
                "glasses": {
                    "value": "false",
                    "confidence": 16
                "smiling": {
                    "value": "true",
                    "confidence": 92
                "face": {
                    "value": "true",
                    "confidence": 79
                "gender": {
                    "value": "male",
                    "confidence": 50
                "mood": {
                    "value": "happy",
                    "confidence": 75
                "lips": {
                    "value": "parted",
                    "confidence": 39
    "status": "success",
    "usage": {
        "used": 42,
        "remaining": 4958,
        "limit": 5000,
        "reset_time_text": "Tue, 24 Jan 2012 05:23:21 +0000",
        "reset_time": 1327382601

The JSON includes an array of photos (if you sent more than one to be analyzed) and then an array of tags – one tag for each detected face. The important part for cropping purposes are the attributes dealing with height, width, and center:

		"width": 43,
      "height": 34.4,
      "center": {
          "x": 46,
          "y": 52.4

These numbers represent percentage values from 0-100. So the width of the face is 43% of the image’s total width. If the image is 200 pixels wide, then the face spans 86 pixels.

Using your favorite HTTP-calling library (I like the RestClient gem), you can simply ping the API’s detect feature to get these coordinates for any image you please.

Image manipulation with RMagick

So how do we do the actual cropping? By using the RMagick (a Ruby wrapper for the ImageMagick graphics library) gem, which lets us do crops with commands as simple as these:

img ="somefile.jpg")[0]

# crop a 100x100 image starting from the top left corner
img = img.crop(0,0,100,100) 

The RMagick documentation page is a great place to start. I’ve also written an image-manipulation chapter for The Bastards Book of Ruby.

The Process

The code for all of this is stored at my Github account.

I’ve divided this into two parts/scripts. You could combine it into one script but to make things easier to comprehend (and to lessen the amount of best-practices error-handling code for me to write), I divide it into a “fetch” and “process” stage.

In the fetch.rb stage, we essentially download all the remote files we need to do our work:

  • Download a zip file of images from Sunlight Labs and unzip it at the command line
  • Use NYT’s Congress API to get latest list of Senators
  • Use API to download face-coordinates as JSON files

In the process.rb stage, we use RMagick to crop the photos based from the metadata we downloaded from the NYT and As a bonus, I’ve thrown in a script to programmatically create a crude webpage that ranks the Congressmembers’ faces by smile, glasses-wearingness, and androgenicity. How do I do this? The API handily provides these numbers in its response:

	"attributes": {
            "glasses": {
                "value": "false",
                "confidence": 16
            "smiling": {
                "value": "true",
                "confidence": 92
            "face": {
                "value": "true",
                "confidence": 79
            "gender": {
                "value": "male",
                "confidence": 50
            "mood": {
                "value": "happy",
                "confidence": 75
            "lips": {
                "value": "parted",
                "confidence": 39

I’m not going to reprint the code from my Github account, you can see the scripts yourself there:

First things first: sign up for API keys at the NYT and

I also use the following gems:

The Results

Here’s what you should see after you run the process.rb script (all judgments made by’s algorithm…I don’t think everyone will agree with about the quality of the smiles):

10 Biggest Smiles

Sen. Wicker (R-MS)
Sen. Wicker (R-MS) [100]
Sen. Reid (D-NV)
Sen. Reid (D-NV) [100]
Sen. Shaheen (D-NH)
Sen. Shaheen (D-NH) [99]
Sen. Hagan (D-NC)
Sen. Hagan (D-NC) [99]
Sen. Snowe (R-ME)
Sen. Snowe (R-ME) [98]
Sen. Kyl (R-AZ)
Sen. Kyl (R-AZ) [98]
Sen. Klobuchar (D-MN)
Sen. Klobuchar (D-MN) [98]
Sen. Crapo (R-ID)
Sen. Crapo (R-ID) [98]
Sen. Johanns (R-NE)
Sen. Johanns (R-NE) [98]
Sen. Hutchison (R-TX)
Sen. Hutchison (R-TX) [98]

10 Most Ambiguous Smiles

Sen. Inouye (D-HI)
Sen. Inouye (D-HI) [40]
Sen. Kohl (D-WI)
Sen. Kohl (D-WI) [43]
Sen. McCain (R-AZ)
Sen. McCain (R-AZ) [47]
Sen. Durbin (D-IL)
Sen. Durbin (D-IL) [49]
Sen. Roberts (R-KS)
Sen. Roberts (R-KS) [50]
Sen. Whitehouse (D-RI)
Sen. Whitehouse (D-RI) [52]
Sen. Hoeven (R-ND)
Sen. Hoeven (R-ND) [54]
Sen. Alexander (R-TN)
Sen. Alexander (R-TN) [54]
Sen. Shelby (R-AL)
Sen. Shelby (R-AL) [62]
Sen. Johnson (D-SD)
Sen. Johnson (D-SD) [63]

The Non-Smilers

Sen. Bingaman (D-NM)
Sen. Bingaman (D-NM) [79]
Sen. Coons (D-DE)
Sen. Coons (D-DE) [77]
Sen. Burr (R-NC)
Sen. Burr (R-NC) [72]
Sen. Hatch (R-UT)
Sen. Hatch (R-UT) [72]
Sen. Reed (D-RI)
Sen. Reed (D-RI) [71]
Sen. Paul (R-KY)
Sen. Paul (R-KY) [71]
Sen. Lieberman (I-CT)
Sen. Lieberman (I-CT) [59]
Sen. Bennet (D-CO)
Sen. Bennet (D-CO) [55]
Sen. Udall (D-NM)
Sen. Udall (D-NM) [51]
Sen. Levin (D-MI)
Sen. Levin (D-MI) [50]
Sen. Boozman (R-AR)
Sen. Boozman (R-AR) [48]
Sen. Isakson (R-GA)
Sen. Isakson (R-GA) [41]
Sen. Franken (D-MN)
Sen. Franken (D-MN) [37]

10 Most Bespectacled Senators

Sen. Franken (D-MN)
Sen. Franken (D-MN) [99]
Sen. Sanders (I-VT)
Sen. Sanders (I-VT) [98]
Sen. McConnell (R-KY)
Sen. McConnell (R-KY) [98]
Sen. Grassley (R-IA)
Sen. Grassley (R-IA) [96]
Sen. Coburn (R-OK)
Sen. Coburn (R-OK) [93]
Sen. Mikulski (D-MD)
Sen. Mikulski (D-MD) [93]
Sen. Roberts (R-KS)
Sen. Roberts (R-KS) [93]
Sen. Inouye (D-HI)
Sen. Inouye (D-HI) [91]
Sen. Akaka (D-HI)
Sen. Akaka (D-HI) [88]
Sen. Conrad (D-ND)
Sen. Conrad (D-ND) [86]

10 Most Masculine-Featured Senators

Sen. Bingaman (D-NM)
Sen. Bingaman (D-NM) [94]
Sen. Boozman (R-AR)
Sen. Boozman (R-AR) [92]
Sen. Bennet (D-CO)
Sen. Bennet (D-CO) [92]
Sen. McConnell (R-KY)
Sen. McConnell (R-KY) [91]
Sen. Nelson (D-FL)
Sen. Nelson (D-FL) [91]
Sen. Rockefeller IV (D-WV)
Sen. Rockefeller IV (D-WV) [90]
Sen. Carper (D-DE)
Sen. Carper (D-DE) [90]
Sen. Casey (D-PA)
Sen. Casey (D-PA) [90]
Sen. Blunt (R-MO)
Sen. Blunt (R-MO) [89]
Sen. Toomey (R-PA)
Sen. Toomey (R-PA) [88]

10 Most Feminine-Featured Senators

Sen. McCaskill (D-MO)
Sen. McCaskill (D-MO) [95]
Sen. Boxer (D-CA)
Sen. Boxer (D-CA) [93]
Sen. Shaheen (D-NH)
Sen. Shaheen (D-NH) [93]
Sen. Gillibrand (D-NY)
Sen. Gillibrand (D-NY) [92]
Sen. Hutchison (R-TX)
Sen. Hutchison (R-TX) [91]
Sen. Collins (R-ME)
Sen. Collins (R-ME) [90]
Sen. Stabenow (D-MI)
Sen. Stabenow (D-MI) [86]
Sen. Hagan (D-NC)
Sen. Hagan (D-NC) [81]
Sen. Ayotte (R-NH)
Sen. Ayotte (R-NH) [79]
Sen. Klobuchar (D-MN)
Sen. Klobuchar (D-MN) [79]

For the partisan data-geeks, here’s some faux analysis with averages:

Party Smiles Non-smiles Avg. Smile Confidence
D 44 7 85
R 42 5 86
I 1 1 85

There you have it, the Republicans are the smiley-est party of them all.

Further discussion

This is an exercise to show off the very cool API and to demonstrate the value of a little programming knowledge. Writing the script doesn’t take too long, though I spent more time than I liked on idiotic bugs of my own making. But this was way preferable than cropping photos by hand. And once I had the gist of things, I not only had a set of cropped files, I had the ability to whip up any kind of visualization I needed with just a minute’s more work.

And it wasn’t just face-detection that I was using, but face-detection in combination with deep data-sources like the Times’s Congress API and the Sunlight Foundation. For the SOPA Opera app, it didn’t take long at all to populate the site with legislator data and faces. (I didn’t get around to using this face-detection technique to clean up the images, but hey, I get lazy too…)

Please don’t judge the value of programming by my silly example here – having an easy-to-use service like API (mind the usage terms, of course) gives you a lot of great possibilities if you’re creative. Off the top of my head, I can think of a few:

  • As a photographer, I’ve accumulated thousands of photos but have been quite lazy in tagging them. I could conceivably use’s API to quickly find photos without faces for stock photo purposes. Or maybe a client needs to see male/female portraits. The API gives me an ad-hoc way to retrieve those without menial browsing.
  • Data on government hearing webcasts are hard to come by. I’m sure there’s a programmatic way to split up a video into thousands of frames. Want to know at which points Sen. Harry Reid shows up? Train’s API to recognize his face and set it loose on those still frames to find when he speaks.
  • Speaking of breaking up video…use the Face API to detect the eyes of someone being interviewed and use RMagick to detect when the eyes are closed (the pixels in those positions are different in color than the second before) to do that college-level psych experiment of correlating blinks-per-minute to truthiness.

Thanks for reading. This was a quick post and I’ll probably go back to clean it up. At some point, I’ll probably add this to the Bastards Book.

A Million Pageviews, Thousands of Dollars Poorer, and Still Countlessly Richer.

Snowball fight in Times Square, Manhattan, New York

Update: This post rambled longer than I intended it to and I forgot that I had meant to include some observations on what I’ve noticed about Flickr’s traffic pattern. I’ve added some grafs to the bottom of this post.

My Flickr account hit 1,000,000 pageviews this weekend. Two years ago, I bought a Pro account shortly after the above photo of some punk kid throwing a snowball at me in Times Square was posted on Flickr’s blog. Since then I set my account to share all of my photos under the Creative Commons Non-commercial license (but I’ve let anyone who asks use them for free).

My account was on track to have 500K pageviews by October (of this past year) but then this photo of pilots marching on Wall Street hit Reddit and attracted 150K views all by itself, so then a million total views seemed just around the corner :).

Net Profit

Mermaid Parade 2010, Coney Island

I was paid $120 for this photo, which was used in New York’s campaign to remind people that they can’t smoke in Coney Island (or any other public park).

So how much have I gained monetarily in these two years of paying for a Flickr Pro account?

Two publications offered a total of $135 for my work. Minus the two years of Pro fees ($25 times 2 years) and that comes to about $80. If I spent at minimum 1 minute to shoot, edit, process, and upload each of my ~3,100 photos, I made a rate of $1.50/hour for my work.

Of course, I’ve spent much more time than one minute per photo. And I’ve taken far more than 3,100 photos (I probably have 15 to 20 times as many stored on my backup drives). And of course, thousands of dollars for my photo equipment, including repairs and replacements. So:

  • + $135 from publications
  • – $50 for Flickr Pro fees
  • – $8,000 (and change) for Canon 5D Mark 2, Canon S90, lenses, repairs from constant use in the rain/snow/etc.

So doing the math…I’m several thousands of dollars in the hole.


Monetarily, my photography is a large loss for me. I’m lucky enough to have a job (and, for better or worse, no car or mortgage and few other hobbies to pay for) to subsidize it. So why do I keep doing it and, in general, giving away my work for free?

Well, there is always the promise of potential gain:

  • I made a $1,000 (mostly to cover expenses) to shoot a friend’s wedding because his fiance liked the work I posted on my Facebook account…but weddings are so much work that I’ve decided to avoid shooting them if I can help it.
  • I’ve also taken photos for my job at ProPublica, including this portrait for a story that was published in the Washington Post. I’m not employed specifically to take photos, but it’s nice to be able to do it on office time.
  • I also now have a large cache of stock photos to use for the random sites I build. For example, I used the Times Square snowball photo to illustrate a programming lesson on image manipulation and face-recognition technology.
  • Even if my photos were up to professional par, I’m not the type to declare (in person) to others, “Hey, one of my hobbies is photography. Look at these pictures I took.” Flickr/Facebook/Tumblr is a nice passive-humblebrag way to show this side passion to others. And I’ve made a few good friends and new opportunities because of the visibility of my work.

In the scheme of things, a million pageviews is not a lot for two years…A photo might get that in a few days if it’s a popular enough meme. And pageviews have only a slight correlation to actual artistic merit (neither the above snowball or pilot photos are my favorite of the series). But it’s amazing and humbling to think that – if the average visitor who stumbles on my account might look at 4 photos – something I’ve done as a hobby might have reached nearly a quarter million people (not counting the times when sites take advantage of the CC-licensing and reprint my photos).

Having any kind of audience, no matter how casual, is necessary to practice improve my art if I were to ever try to become a paid professional photographer. So that’s one important way that I’m getting something from my online publishing.

Photos are as free as the photographer wants them to be

My personal milestone coincidently comes after the posting of two highly-linked-to articles on the costs of a photo: This Photograph is Not Free by John Mueller and This Photograph is Free by Tristan Nitot. They both make good points (Mueller’s response to Nitot is nuanced and deserves to also be considered).

Mueller and Nitot aren’t necessarily at odds at each other so there’s not much for me to add. Photos are worth good money. To cater to a client, to buy the (extra) professional equipment, to spend more time in editing and post-processing (besides cropping, color-correction and contrast, I don’t do much else to my photos), to take more time to be there at an assignment – this is all most definitely worth charging for.

And that is precisely why I don’t put the effort into marketing or selling mine. The money isn’t worth taking that amount of time and energy from what I currently consider my main work and passion. However, what I’ve gotten so far from my photography – the extra incentive to explore the great city I live in, the countless friends and memories, and of course, the photos to look back on and reuse for whatever I want – the $8,000 deficit is easily covered by that. Having the option to easily share my photos to (hopefully) inspire and entertain others is icing.

One more side-benefit of using a public publishing system like Flickr: I couldn’t devise a better way to organize and browse my own work with minimal effort. And I’m often rediscovering what I considered to be throwaway photos because others find them interesting.

Here are a few other photos I’ve taken over the years that were either frequently-viewed or considered “interesting” by Flickr’s bizarre algorithm:

Jumping for joy during New York blizzard, Times Square
The Cat is the Hat
Sunset over Battery Park and Statue of Liberty
Woman in white, pilots
Pushing a Taxi - New York Blizzard Snowstorm Thundersnow Blaaaaagh
Lightning strikes the Empire State Building
Brooklyn Bridge photographer-tourist, Photo of
Atrium, Museum of Natural History
Union Square Show
Casting Couch (#NYFW Spring 2012)
Williamsburg: Beautiful dogs
New York Snow Blizzard 2011, Lone Man on the Brooklyn Bridge
Ground Zero NY celebrates news of Osama bin Laden's death
Grand Central Moncler NYFW Flash Mob Dancin
Broadway Rainstorm
Towers of Light 9/11
Manhattanhenge from a Taxi

A few more observations on Flickr pageviews: It’s hard to say if 1,000,000 page views is a lot especially considering the number of photos I have uploaded in total. Before the pilots on Wall Street photo, I averaged about 200-500 pageviews a day. After that, I put more effort into maintaining my account and regularly uploading photos. Now on a given day, if I don’t upload anything particularly interesting the account averages about 1,500 views.

Search engines bring very little traffic. So other than what (lack of) interest my photos have for the general Internet, I think my upload-and-forget mindset towards my account also limits my pageviews. I have a good friend on Flickr who gets far fewer pageviews but gets far more comments than I do. I rarely comment on my contacts’ photos and barely participate in the various groups.

I’m disconnected enough from the Flickr social scene that I only have a very vague understanding of how its Explore section works. Besides the blog, the Explore collection is the best way to get seen on Flickr. It features “interesting” photos as determined by an algorithm that, as best I can tell, is affected by some kind of in-group metric.

I’ve only had three photos make it to Explore: the snowball fight in Times Square, the lightning hitting the Empire State Building, and this one where my subway train got stuck and we had to walk out the tunnel. The pilots photo did not make it to Explore, so I’m guessing that amount of traffic (particularly if a huge portion of it comes from one link on Reddit) is not necessarily a prime factor to getting noticed by Flickr’s algorithm.

New York in 2011, the photo version

A little late on this but I posted a few photos I took in NYC this year over at my Tumblr, Eye Heart New York.

This year seemed like my most sheltered, uncreative year yet…even so, according to Flickr’s count, 3/4 of the 3,000+ photos I’ve uploaded in total took place in 2011. I guess when so much just happens next to me (basically, OccupyWallStreet camping out a few blocks away) it’s hard not to snap a few pics. I almost broke the million views mark (for the two years that I’ve been on Flickr) and one of my photos finally made it on someone’s dining room wall, so not too bad a year no matter what it felt.

Here’s a few of the photos; visit Eye Heart New York for the rest:

Pre-Hurricane Irene: Naked Cowboy

WTC, eve of 9/11/2011

Casting Call, New York Fashion Week Spring 2012

Lightning strikes the Empire State Building

OccupyWallSt, day of canceled city cleaning Sit-in, jazz hands! OccupyWallStreet goes to occupy Times Square, 10/15/2011

High Line Park, Art Installation, Section 2

New Museum slide!

See the rest at Eye Heart New York

United/Continental pilots march on Wall Street

The SOPA Debate and How It’s Affected by Congress’s Understanding of Child Porn

Rep. Lamar Smith, chairman of the House Judiciary Committee and SOPA sponsor

Rep. Lamar Smith, chairman of the House Judiciary Committee and SOPA sponsor

Update (1/22/2012): SOPA was indefinitely postponed by Rep. Lamar Smith on Friday (PIPA is likewise stalled). Rep. Smith also has another Internet rights bill on deck though: the The Protecting Children from Internet Pornographers Act of 2011, which mandates that Internet services store customer data for up to 18 months to make it easier for law enforcement to investigate them for child porn trafficking. This proposed bill is discussed in the latter half of this post, including how its level of support is similar (and different) than SOPA’s.

H.R. 1981 has made it farther than SOPA did. It made it out of the Judiciary Committee (which is chaired by Rep. Lamar Smith and also handled SOPA) with a 19-10 vote in July of last year and is placed on the Union Calendar. Compare HR.1981’s progress compared to SOPA’s). H.R. 1981 has 39 cosponsors, compared to SOPA’s original 31. Read the text of HR 1981.

One thing I’ve learned from the whole SOPA affair is how obscure our lawmaking process is even in this digital age. The SOPA Opera site I put up doesn’t do anything but display publicly available information: which legislators support/oppose SOPA and why. But it still got a strong reaction from users, possibly because they misunderstand our government’s general grasp of technology issues.

Sen. Al Franken

Sen. Al Franken is one of the co-sponsors for PROTECT-IP, the Senate's version of SOPA

The most common refrain I saw was: “I cannot believe that Rep/Senator [insert name] is for SOPA! [insert optional expletive].” In particular, “Al Franken” was a frequently invoked name because his fervent advocacy on net neutrality seemed to make the Minnesota senator, in many of his supporters’ opinions, an obvious enemy of SOPA. In fact, one emailer accused me of being out to slander Franken, even though the official record shows that Franken has spoken strongly for PROTECT-IP (the Senate version of SOPA) and even co-sponsored it.

So there’s been a fair amount of confusion as to what mindset is responsible for SOPA. Since party lines can’t be used to determine the rightness/wrongess of SOPA, fingers have been pointed at the money trail: SOPA’s proponents reportedly receive far more money from media/entertainment-affiliated donors than they do from the tech industry. The opposite trend exists for the opponents.

It’s impossible of course to know exactly what’s in the our legislators’ minds. But a key moment during the Nov. 16 House Judiciary hearing on SOPA suggest that their opinions may be rooted less in malice/greed (if you’re of the anti-SOPA persuasion) than in something far more prosaic: their level of technological comprehension.

You can watch the entire, incredibly-inconvenient-to-access webcast at the House Judiciary’s hearing page. I’ve excerpted a specific clip in which Rep. Tom Marino (R-PA) is asking Katherine Oyama (Google’s copyright lawyer) about why Google can stop child porn but not online piracy:

REP. MARINO: I want to thank Google for what it did for child pornography – getting it off the website. I was a prosecutor for 18 years and I find it commendable and I put those people away. So if you can do that with child pornography, why can you not do it [with] these rogue websites [The Pirate Bay, et al.]? Why not hire some whiz kids out of college to come in and monitor this and work for the company to take these off?

My daughter who is 16 and my son who is 12, we love to get on the Internet and we download music and we pay for it. And I get to a site and I say this is a new one, this is good, we can get some music here. And my daughter says Dad, don’t go near that one. It’s illegal, it’s free, and given the fact that you’re on Judiciary, I don’t think you should be doing that…Maybe we need to hire her [laugh]…but, why not?

OYAMA: The two problems are similar in that they’re both very serious problems they’re both things that we all should be working to fighting against. But they’re very different in how you go about combatting it. So for child porn, we are able to design a machine that is able to detect child porn. You can detect certain colors that would show up in pornography, you can detect flesh tones. You can have manual review, where someone would look at the content and they would say this is child porn and this shouldn’t appear.

We can’t do that for copyright just on our own. Because any video, any clip of content, it’s going to appear to the user to be the same thing. So you need to know from the rights holder…have you licensed it, have you authorized it, or is this infringement?”

REP. MARINO: I only have a limited amount of time here and I appreciate your answer. But we have the technology, Google has the technology, we have the brainpower in this country, we certainly can figure it out.

The subject of child pornography is so awful that it’s little wonder that no one really thinks about how it’s actually detected and stopped. As it turns out, it’s not at all complicated.

When I was a college reporter, I had the idea to drive down to the county district attorney’s office and go through all the search warrants. Search warrants become part of the public record, but district attorneys can seal them if police worry that details in an affidavit or search warrant would jeopardize an investigation. I wanted to count how many times this was done at the county DA, because some major cases had been sealed for months. And I wondered if the DA was too overzealous in keeping private what should be the people’s business.

But there were plenty of big cases among the unsealed warrants. I went to college in a small town but there was a bizarre, seemingly constant stream of students being charged with child porn possession. Either college students were becoming particularly perverse or the campus police happened to be crack cyber-sleuths in rooting out the purveyors.

I don’t know about the former, but I learned that the police were not particularly skilled at hacking, based on their notes in the search warrants. In fact, finding the suspects was comically easy because of the unique setup of our college network. Everyone in the dorms had an ethernet hookup but there was no Google, Napster or BitTorrent at the time. So one of the students built a search engine that allowed any student to search the shared files of every other student. And since Windows apparently made this file sharing a default (and at the time, 90+ percent of students’ computers were PCs), the student population had inadvertent access to a huge breadth of files, including MP3s and copied movies and even homework papers.

So to find out if anyone had child porn, the police could just log onto the search engine and type in the appropriate search terms. But the police didn’t even have to do this. Other students would stumble upon someone’s porn collection (you had the option of exploring anyone’s entire shared folder, not just files that came up on the search) and report it. The filenames were all the sickening indication needed to suspect someone of possession.

Google’s Oyama alludes to more technically sophisticated ways of detecting it, but the concept is just as simple as it was at my college: no matter how it’s found, child pornography is easy to categorize as child porn because of its visual characteristics, whether it’s the filename or the images itself. In fact, it’s not even necessary for a human to view a suspected file to know (within a high mathematical probability) that it contains the purported illegal content.

If you’ve ever used Shazam or any of the other song-recognition services, you’ve put this concept into practice. When you hold up a phone to identify a song playing over the bar’s speakers, it’s not as if your phone dials up one of Shazam’s resident music experts who then responds with her guess of the song. The Shazam app looks for certain high points (as well as their spacing, i.e. the song’s rhythm) to generate a “fingerprint” of the song, and then compares it against Shazam’s master database of song “fingerprints”.

No human actually has to “listen” to the song. This is not a new technological concept; it’s as old as, well, the fingerprint database used by law enforcement.

So what Rep. Marino essentially wants is for Google to build a Shazam-like service that doesn’t just identify a song by “listening” to it, but also determines if whoever playing that song has the legal right to do so. Thus, this anti-pirate-Shazam would have to determine from the musical signature of a song such things as whether it came from an iTunes or Amazon MP3 or a CD. And not only that, it would have to determine whether or not the MP3 or CD is a legal or illegal copy.

In a more physical sense, this is like detecting a machine that can determine from a photograph of your handbag whether it’s a cheap knockoff and whether or not you actually own that bag – as opposed to having stolen it, or having bought it from someone who did steal it.

I’m not a particularly skilled engineer but I can’t fathom how this would be done and neither can Google, apparently. But Rep. Marino and at least a few others on the House Judiciary committee have more faith in Google’s technical prowess and they don’t believe that Google is doing enough.

And frankly, I can’t blame them.

From their apparently non-technical vantage point, what they see is that Google is an amazing company who seems to have no limit in its capabilities. It can instantly scour billions of webpages. It can plot in seconds the driving route from Des Moines ot Oaxaca, Mexico. And at some point, might even make a car that drives that route all by itself.

And Google has demonstrated the power to stop evil acts, because it has effectively prevented the spread of child porn in its search engine and other networks. Child porn is a terrible evil; software/media piracy less so. It stands to reason – in a non-technical person’s thinking – that anyone who can stop a great evil must surely be able to stop a lesser evil.

And so, to continue this line of reasoning, if Google doesn’t stop a lesser evil such as illegal MP3 distribution, then it must be because it doesn’t care enough. Or, as some House members noted, Google is loathe to take action because it makes money off of sites that trade in ill-gotten intellectual property.

So you can see how one’s position on SOPA may be inspired not as much out of devotion to an industry but more from a particular (or lack thereof) understanding of the technological tradeoffs and hurdles.

Rep. Marino et. all sees this as something within the realm of technological possibility for Google’s wizards, if only they had some legal incentive. Google and other SOPA opponents see that the problem that SOPA ostensibly tackles is not one that can be solved with any amount of technological expertise. Thus, each side can be as anti-online-piracy/pro-intellectual-property as the other and yet fight fiercely over SOPA.

Smith’s anti-child porn, database-building bill

Though SOPA has taken the spotlight, there is another Internet-related bill on the House Judiciary’s agenda. It’s H.R. 1981, a.k.a The Protecting Children from Internet Pornographers Act of 2011, which proposed a mandate that Internet sites keep track of their users IP information for up to 18 months, to make it easier to investigate Internet crimes – such as downloading child pornography.

H.R. 1981 was introduced by House Judiciary Chairman Rep. Lamar Smith (R-Tex.) who is, of course, the legislator who introduced SOPA. And like SOPA, the support for H.R. 1981 is non-partisan because child pornography is neither a Republican or Democratic cause.

And also like SOPA, the opposition to H.R. 1981 is along non-partisan lines. Among the most vocal opponents to the child porn bill is the Judiciary committee’s ranking member Rep. John Conyers (D-MI). Is it because he is in the pocket of the child porn lobby? No; Conyers argues that even though child porn is bad, H.R. 1981 relies on using technology in a way that is neither practical nor ethical. From CNET:

The bill is mislabeled,” said Rep. John Conyers of Michigan, the senior Democrat on the panel. “This is not protecting children from Internet pornography. It’s creating a database for everybody in this country for a lot of other purposes.”

Rep. John Conyers (D-MI)

Rep. Conyers apparently understands that just because a law purports to fight something as evil (and, of course, politically unpopular) as child pornography doesn’t mean that the law’s actual implementation will be sound.

So when the wrong-to-be-righted is online piracy – i.e. SOPA – what is Conyers’ stance? He is one of its most vocal supporters:

The Internet has regrettably become a cash-cow for the criminals and organized crime cartels who profit from digital piracy and counterfeit products. Millions of American jobs are at stake because of these crimes.

Is it because Conyers is in the pocket of big media? Or that he hates the First Amendment? That’s not an easily apparent conclusion judging from his past votes and legislative history.

It’s of course possible that Conyers takes this particular stance on SOPA because SOPA, all things considered, happens to be a practical and fair law in the way that H.R. 1981 isn’t.

But a more cynical viewpoint is that Conyer’s technological understanding for one bill does not apply to the other. Everyone has been screwed over at some point by a massive, faceless database so it’s easy to be fearful of online databases – in fact, the less you know about computers, the more concerned you’ll be of the misuse of databases.

The technological issues underlying SOPA are arguably far more complex, though, and it’s not clear – as evidenced by Rep. Marino’s line of questioning – that Congressmembers, whether they support or oppose SOPA, have a full understanding of them.

As it stands though, SOPA had 31 cosponsors at its heyday. H.R. 1981 has 39. It will be interesting to see if this bill by Rep. Smith will face any residual backlash after what happened with SOPA. – A hand-made list of SOPA / PROTECT-IP Congressional supporters and opponents

I’ve always been interested in exploring the various online Congressional information sources and the recent SOPA debate seemed like a good time to put some effort in it…also, I’ve always wanted to try out the excellent isotope Javascript library.

I had been passively paying attention to the debate and was surprised at how hard it was to find a list of supporters and opponents, given how much it’s dominated my (admittedly small bubblish) internet communities.

When I set out to compile the list, though, I could see why…the official government sites don’t make it easy to find or interpret the information. So SOPAopera is my game attempt at putting some basic information about it…the feedback I’ve gotten so far indicates that even constituents who have been reading a lot about SOPA/PROTECT-IP are surprised at the level and diversity of support the laws have among Congressmembers.