Author Archives: Dan

How to compile OpenCV 2.4.10 on Ubuntu 14.04 and 14.10

For my upcoming Computational Methods in the Civic Sphere class at Stanford, I wanted my students to have access to OpenCV so that they could explore computer-vision algorithms, such as face detection with Haar classifiers.

On the Stanford FarmShare machines (which run on Ubuntu 13.10), I had trouble getting their installation of OpenCV working, but was able to use the
Anaconda distribution to install both Python 2.7.8 and OpenCV via the Binstar package repo.

Briefly, here are the instructions:

  1. Get the Anaconda download link
  2. curl (*the-anaconda-URL-script*) -o /tmp/ && bash /tmp/
  3. conda install binstar
  4. conda install -c opencv

Note: For Mac users for whom `brew install opencv` isn’t working: Anaconda worked well enough for me, though I had to install from a different pacakge repo:

conda install -c opencv

The Anaconda system, which I hadn't used before but find really convenient, automatically upgrades/downgrades the necessary dependencies (such as numpy).

Using Anaconda works fine on fresh Ubuntu installs (I tested on AWS and Digital Ocean), but I wanted to see if I could compile it from source just in case I couldn't use Anaconda. This ended up being a very painful time of wading through blog articles and Github issues. Admittedly, I'm not at all an expert at *nix administration, but it's obvious there's a lot of incomplete and varying answers out there.

The on OpenCV are the most extensive, but right at the top, they state:

Ubuntu's latest incarnation, Utopic Unicorn, comes with a new version of libav, and opencv sources will fail to build with this new library version. Likewise, some packages required by the script no longer exist (libxine-dev, ffmpeg) in the standard repositories. The procedures and script described below will therefore not work at least since Ubuntu 14.10!

The removal of ffmpeg from the official Ubuntu package repo is, from what I can tell, the main source of errors when trying to compile OpenCV for Ubuntu 14.04/14.10. Many of the instructions deal with getting ffmpeg from a personal-package-archive and then trying to build OpenCV. That approach didn't work for me, but admittedly, I didn't test out all the possible variables (such as version of ffmpeg).

In the end, what worked was to simply just set the flag to build without ffmpeg:

  cmake [etc] -D WITH_FFMPEG=OFF

I've created a gist to build out all the software I want for my class machines, but here are the relevant parts for OpenCV:

sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get -y dist-upgrade && sudo apt-get -y autoremove

# build developer tools. Some of these are probably non-pertinent
sudo apt-get install -y git-core curl zlib1g-dev build-essential \
     libssl-dev libreadline-dev libyaml-dev libsqlite3-dev \
     libxml2-dev libxslt1-dev libcurl4-openssl-dev \

# numpy is a dependency for OpenCV, so most of these other
# packages are probably optional
sudo apt-get install -y python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose
## Other scientific libraries (obviously not needed for OpenCV)
pip install -U scikit-learn
pip install -U nltk

### opencv from source
# first, installing some utilities
sudo apt-get install -y qt-sdk unzip
curl "${OPENCV_VER}.zip" -o opencv-${OPENCV_VER}.zip
unzip "opencv-${OPENCV_VER}.zip" && cd "opencv-${OPENCV_VER}"
mkdir build && cd build
# build without ffmpeg

A recurring issue I had come across – I didn't test it myself, but just saw it in the variety of speculation regarding the difficulty of building OpenCV – is that building with a Python other than the system's Python would cause problems. So, for what it's worth, the above process works with 14.04's Python 2.7.6, and 14.10's 2.7.8. I'm not much of a Python user myself so I don't know much about best practices regarding environment…pyenv works pretty effortlessly (that is, it works just like rbenv), but I didn't try it in relation to building OpenCV.

Also, this isn't the bare minimum…I'm not sure what dev tools or which cmake flags are are absolutely needed, or if qt-sdk is needed if you don't build with Qt support. But it works, so hopefully anyone Googling this issue will be able to make some progress.

Note: Other things I tried that did not work on clean installs of Ubuntu 14.04/14.10:

The Python code needed to do simple face-detection looks something like this (based off of examples from OpenCV-Python and Practical Python and OpenCV:

(You can find pre=built XML classifiers at the OpenCV repo)

import cv2
face_cascade_path = '/YOUR/PATH/TO/haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(face_cascade_path)

scale_factor = 1.1
min_neighbors = 3
min_size = (30, 30)
flags =

# load the image
image_path = "YOUR/PATH/TO/image.jpg"
image = cv2.imread(image_path)

# this does the work
rects = face_cascade.detectMultiScale(image, scaleFactor = scale_factor,
  minNeighbors = min_neighbors, minSize = min_size, flags = flags)

for( x, y, w, h ) in rects:
  cv2.rectangle(image, (x, y), (x + w, y + h), (255, 255, 0), 2)

cv2.imwrite("YOUR/PATH/TO/output.jpg", image)

The Computer Science of Tomorrow, Today, and the Past

“The thing you are doing has likely been done before. And that might seem depressing, but I think it’s the most wonderful thing ever. Because it means an education in computer science is worth something.”

The quote above comes from an informative and entertaining talk that John Graham-Cumming gave at OSCON in 2013, in which he points out that there hasn’t been much new in computing since 1983, what with wireless networking first implemented in 1971, markup languages in the 1960s, and, as pictured above, “hypertext with clickable links, 1967″.

Because progress has largely consisted of performance and interface improvements, it is comforting to know that applying yourself to the knowledge of computing is nearly as vital and timeless a pursuit as math and literacy. Fittingly, I saw this video after someone linked to it on Hacker News, in response to a 1964 Atlantic article I linked to: Martin Greenberger’s “Computers of Tomorrow”.

In his 50-year-old essay, Greenberger effectively predicts, the Internet, net neutrality, cloud computing, and the automation of the New York Stock Exchange. But the best line is the essay’s last line, which aligns with Graham-Cumming’s optimism about human knowledge in computing:

By 2000 AD man should have a much better comprehension of himself and his system, not because he will be innately any smarter than he is today, but because he will have learned to use imaginatively the most powerful amplifier of intelligence yet devised.

Graham-Cumming’s talk is available on SlideShare too.

MySQL (and SQLite) for Data Journalists

My first task since joining Stanford was to create the Public Affairs Data Journalism I, a required course for all students in the graduate program. As public records and government workings deserve their own class, I didn’t know for sure if it’d be worth teaching SQL to my students, most of whom hadn’t gone beyond Excel.

But after running out of patience with the finicky nature of spreadsheet GUIs, I decided to unload a bevy of SQL syntax on my students earlier this month. They picked it up so quickly that last week, I based their midterm almost entirely on evaluating their SQL prowess, and I can say with some admiration, they now have more knowledge of SQL than I did after a year or so of self-learning…even though for many of them, this is their first time learning a programming language in the context of journalism.

I’ve been creating tutorials for their convenience, and you can use them too. Because I’m dealing with a variety of operating systems, from Windows XP to OSX 10.6 to 10.9, I decided to give them the option of doing the lessons in MySQL or SQLite…and it wasn’t too frustrating, though I spent more time than I’d like creating multiplatform datasets and lessons.

I’ll write more about my thoughts on teaching SQL in a longer post, but I can say that I am most definitely now a believer in moving past spreadsheets to SQL’s expressive way of data querying.



Preparing eggs and programming

As an egg fan, I loved this Times dining article about a “tasting expedition” of the high- and low-brow egg dishes in New York. As a programmer, there were two passages that stuck out to me about the nature of skill, complexity, and genius behind cooking (and programming):

“In the French Laundry book, no one step is very difficult,” [author Michael Ruhlman] said. “There are just so many that it takes technique to its farthest reaches.” For instance, Mr. Keller insists that fava beans be peeled before cooking. “If you’re good, it takes 20 seconds per bean,” Mr. Ruhlman said. “Someone in his kitchen put a batch of them in the water once it lost its boil. Thomas [Keller] said, ‘Get rid of those.’ That guy didn’t last.”

This next passage comes after the Times writer and Ruhlman visit Aldea in the Flatiron district to try George Mendes’ “signature Knoll Krest Farm Egg with bacalao (salt cod), olive and potato.”

After we left, I expressed surprise that so much effort went into a dish billed on the menu as a “snack.” Mr. Ruhlman nodded. “Working as a chef can be mind-numbingly boring,” he said. “The reason dishes are so good is not because someone is a genius, but because he or she has done it a thousand times. They are looking to keep their minds active and energetic.”

I couldn’t describe programming better myself: no one line is difficult, its the order and arrangement of thousands of steps that make a useful program. And you don’t have to be a genius, but because programming inherently involves repetitive processes, you have to keep your mind alive, and be continuously observant and critical of the patterns you come across.

Everything is (even more) broken

Tech journalist Quinn Norton believes Everything is Broken in computing and in computer security. And so do I. But I’ve rarely disagreed so strongly with someone over something we both ostensibly agree on.

Part of the problem is that Norton’s essay is a bit of a pointless sprawl. I agree completely that the “average piece-of-shit Windows desktop is so complex that no one person on Earth really knows what all of it is doing, or how.” And that this complacency is a bad thing. However, Norton then goes on to list a bunch of government-led security attacks, such as the NSA-Snowden revelations and Stuxnet, in such a way that her message is inescapably, “Windows is bad because the government wants it so.” Or, as Norton puts it, “The NSA is doing so well because software is bullshit.”

Or, maybe the NSA (yes, the same NSA that hired someone who very publicly flouted government surveillance to be their systems admin) is “doing so well” because our political status quo chooses to fund and enable it, and exploiting weaknesses in software is just one tool in the NSA’s politically-supported mission? In which case, improving your software is a very indirect, and mostly ineffective way (including for reasons inherent to software), if you wanted to diminish the NSA’s surveillance power.

This conflating of cause and effect is reflected in how Norton obviously understands how and why software is flawed, but somehow manages to draw the wrong conclusions. For me, the most disagreeable part of Norton’s essay is at the end:

Computers don’t serve the needs of both privacy and coordination not because it’s somehow mathematically impossible. There are plenty of schemes that could federate or safely encrypt our data, plenty of ways we could regain privacy and make our computers work better by default. It isn’t happening now because we haven’t demanded that it should, not because no one is clever enough to make that happen.

This notion that “if only those programmers got their priorities in order, things would be good” is so ass-backwards that I believe Norton’s well-intentioned essay ends up being unintentionally harmful. Even a Manhattan Project of the world’s most diligent and ethical programmers would still be bound by the thesis from of Alan Turing and Alonzo Church, that some computational problems basically are “mathematically impossible.” While I don’t have the computer science chops or patience to write out a proof, but I would humbly submit that the kind of program needed to provide predictable security for all the kinds of wondrous, but unpredictable things humans want to communicate, could be reduced to a Entscheidungsproblem.

So not only is “everything broken”, but there are things broken in such a way that they can’t be fixed in the way we want them to be fixed, just like the proverbial cake we want to eat and have. We’re never going to get a Facebook that makes it possible to find, within milliseconds, 5 select friends, out of a userbase of 1 billion spread out across the world, and share with them an intimately personal photo in such a way that only those five friends will see it and ensure that they never share it in such a way that a potential employer, 5 years from now, might come across it — and to provide such privacy that doesn’t severely impede the convenience and power of social sharing.

The problem is not a horde of incompetent, inhuman programmers at Facebook. It’s not the NSA that pulling the levers here. It’s not the corporate-industrial complex that seeks to strip away our privacy for commercial greed. The problem is us — and by us, I mean what Norton describes as the “ the normal people living their lives under all this insanity” — and our natural desire to wield this amazing power. But unless the range of human thought, action, and desire becomes so limited that it can be summed by a Turing machine, then we must accept that power and privacy involve trade-offs that not just software companies, but that we, “the normals”,  have to make. We have to choose to limit our dependance on systems that are never truly “fixed” in the way humans want them to be.

There’s a whole essay’s worth of tangental argument about how we, “the normals,” have to raise our standard of computing literacy, that we must teach the computer, and not the other way around, but I don’t think it’s fair for me to critique Norton’s essay for being sprawling by writing an even more sprawling piece of my own. But what I find most ironic in Norton’s piece is the distorted concept of agency; her notion that Facebook and Google are not all-powerful, and in fact, “live about a week from total ruin all the time” if only “the normals” would rise up and protest so that those otherwise clever software developers would prove old man Turing wrong.

To put it another way, imagine a literary critic writing an essay about how the state of society’s literacy is “just fucked” because look at how well such Tom Clancy and the Twilight series have sold, despite their derivative, formulaic content. And that publishers and authors would produce more intellectually-edifying books, if only readers everywhere would rise up and demand those intellectually-edifying books to be written. Yes, those very same readers who caused those popular bestsellers to be bestsellers in the first place.

This begging the question is obviously not Norton’s intent. And again, I can’t argue against the notion that “everything is broken” and that everyone needs to be much more aware of it. But I think Norton’s need to cram every hot-tech issue into her critique, that we are all getting hacked because NSA/Stuxnet, ends up conveying a solution that is even less useful than had it been your typical angry, non-actionable essay.


Our complex addiction to medical spending – the New Yorker on the “pain-pills problem”

What we extravagantly spend on healthcare has become even more a pressing topic with the recent release of Medicare spending data – the most detailed dataset yet made public – and of course, the ongoing implementation of Obamacare. Last week, The New Yorker’s Rachel Aviv brought focus to a microlevel of medical spending: a doctor who thought he could save the most rejected of patients, and who now will spend up to 33 years in prison for “the unlawful distribution of controlled substances” that led to the deaths of several patients.

Unfortunately, Aviv’s article, titled “Prescription for Disaster; The heartland’s pain-pills problem” is behind a paywall. Here’s part of the abstract:

In 2005, the medical examiner in Wichita, Kansas, noticed a cluster of deaths that were unusually similar in nature: in three years, sixteen men and women, between the ages of twenty-two and fifty-two, had died in their sleep. In the hours before they lost consciousness, they had been sluggish and dopey, struggling to stay awake. A few had complained of chest pain. “I can’t catch my breath,” one kept saying. All of them had taken painkillers prescribed by a family practice called the Schneider Medical Clinic.

On September 13, 2005, Schneider arrived at work to find the clinic cordoned off with police tape…Agents from the Kansas Bureau of Investigation and the Drug Enforcement Administration led Schneider into one of the clinic’s fourteen exam rooms and asked him why he had been prescribing so many opioid painkillers.

He responded that sixty per cent of his patients suffered from chronic pain, and few other physicians in the area would treat them. The agents wrote, “He tries to believe his patients when they describe their health problems and he will believe them until they prove themselves wrong.” When asked how many of his patients had died, Schneider said that he didn’t know.

Aviv’s article is powerful, moreso because it managed to cover an impressive number of dysfunctional systems while detailing the very human aspect of failure. Dr. Schneider, as Aviv portrays him, is almost the archetype of the ideal heartland doctor. He was a manager of the local grocery’s meat department until he became inspired by how his hospital treated his daughter for pneumonia. He became the first in his family to graduate from college; his daughter tells Aviv that Schneider ‘was “never comfortable with the level of status” that came with the job.’

But Dr. Schneider’s humility and kind-heartedness ran into an ill-timed storm of palliative care research, social dysfunction, and market forces. After he opened his own practice, Dr. Schneider told Aviv that:

Pharmaceutical reps came in and enlightened me that it was O.K. to treat chronic pain because there is no real cure. They had all sorts of studies showing that the long-acting medications were appropriate.

Other doctors in Wichita sent their unwanted patients to Dr. Schneider. And “nearly a dozen sales representatives” would visit him each day, taking him out to meals and cluttering his office with branded gifts. I looked for Dr. Schneider’s name in ProPublica’s Dollars for Docs database, but his clinical work happened well before the wave of financial disclosures that came in 2007. Cephalon, which would later become notorious and criminally charged for illegally marketing its narcotics, was a frequent patron of Dr. Schneider’s. From Aviv’s report:

The company sent Schneider’s physician assistant to New York for an “Actiq consultants meeting”; it paid for her to stay at the W hotel and to ride a boat on the Hudson. In 2003, Schneider was sent to an Actiq conference in New Orleans, sponsored by Cephalon. He said that a specialist told him, “You could stick multiple Actiq suckers in your mouth and your rear end and you still wouldn’t overdose. It’s clinically impossible”

People shocked by the revelation of financial ties between doctors and drug companies often assume (sometimes without enough justification, in my opinion) that the doctors are traitors to the Hippocratic Oath and humanity. But Aviv’s report describes a doctor who is so Pollyannish that a prison guard chides him for talking to The New Yorker and Aviv: “you know she’s just going to tear you apart,” Schneider apparently confides to Aviv.

There’s more going on here than just the chase for money by the drug companies, or the naiveté/cravenness of the doctors who prescribe the drugs. There’s the huge issue of palliative care – how do we know whether patients really “need” painkillers? – and the pressure of politics, including the role of the D.E.A. and patient advocates, and of course, how much government should subsidize health care at all. There’s even the peripheral issue of electronic medical records and bureaucracy; Dr. Schneider’s clinic was so poorly managed that patients, who were rejected by one of the clinic’s doctors, would simply sign up with another doctor who worked at Schneider’s clinic, thanks to the clinic’s sloppy record keeping. It didn’t help that the clinic took in so many patients that “appointments were generally scheduled every ten minutes.”

It’s worth picking up a print copy – or even subscribing – just to read Aviv’s article on Dr. Schneider. It reveals the astonishingly heart-breaking complexity behind medical spending, and yet, even pushing the limits of the longform article format, it barely begins to describe the depth of that complexity.

A guide to using Github for non-developers

I’m constantly being asked by friends to help me with their websites, and I’m constantly not at all enthusiastic to do it. I mean, I enjoy helping friends out and creating things, but web development is not at all the “fun part.” It’s a complex field, but more annoyingly, it’s difficult to scaffold a site so that a web-novice can maintain it. So you either have to settle for being the site’s maintenance person in perpetuity, or, not be bothered that your friends will waste countless hours hacking and breaking a brittle, barely visited website.

Github Pages has been a great and convenient way to publish websites. So I’ve been telling my non-dev friends, hey, just create a Github account and publish away! Unfortunately, while there are many great Github and Git resources, all of them presume that you actually want to use the many cool collaborative, developer-focused features of Git/Github. Whereas I want my non-dev friends just to piggyback off of Github to quickly build a website from scratch.

So in the past month, I’ve slowly been putting together a guide that is as basic as possible, even to the point of showing which buttons to click, and explaining how HTML is different than raw text. Check it out here: Build a Web Portfolio from Scratch with Github Pages.

Check out the Reddit discussion here. To my surprise, even aspiring developers have found it useful, even though the guide is aimed at people who do not intend to be web developers.

Creating this guide isn’t an act of altruism for me, though. It’s another way to experiment with online publishing, namely, how to reduce the friction between thinking of things to write about and getting them onto the Web. I stuck to using Jekyll but kind of wish I had gone with using Middleman. In any case, I feel much further along in having a refined CMS-workflow than I did with the Bastards Books and with my Small Data Journalism site, which is also built on Jekyll.

Fran Allen and the social relevance of computer science

If you haven’t read it yet, Peter Seibel’s Coders at Work (2009), is one of the best books about computer programming that doesn’t have actual code in it. It distills “nearly eighty hours of conversations with fifteen all-time great programmers and computer scientists,” with equal parts given to fascinating technical minutiae (including the respondents’ best/worst bug hunting stories) and to learning how these coders came to think the way they do.

So in a book full of interviews worth reading, it’s not quite accurate to say that Fran Allen stands out. It’s better to say that Allen is different; as a Turing Award recipient for her “pioneering contributions to the theory and practice of optimizing compiler techniques,” Allen spends much of her interview arguing that compiler optimization is woefully unstudied. Allen even argues that the popular adoption of C was a step backwards for computer science, which is kind of an alien concept for those of us today who almost exclusively study and use high-level languages.

Allen is also different in that she’s the only woman in Seibel’s book, and understandably, she has a few thoughts about their place in computer science. The summary of it is that she’s not at all optimistic about the “50/50 by 2020″ initiative (the goal to have women make up half of computer science degree earners by 2020). And the problem, Allen (who was a math teacher herself) is not in the curriculum:

I feel it’s our problem to solve. It’s not telling the educators to change their training; we in the field have to make it more appealing.

What I found particularly insightful in Allen’s interview with Seibel is that it’s not just about the need for more role models, because the current lack of women programmers is going to place a limit on that. In Allen’s opinion, girls have shown an equal aptitude for science, especially in medicine, biology, and ecology. So she suspects that the problem is with how _limited_ computer science can appear as a profession.

At my little high school in Croton, New York, we had a Westinghouse person nationally come in fifth. And they have a nice science program. Six of the seven people in it this year at the senior level are women doing amazing pieces of individual science.

What’s happening with those women is that they’re going into socially relevant fields. Computer science could be extremely socially relevant, but they’re going into earth sciences, biological sciences, medicine. Medicine is going to be 50/50 very soon. A lot of fields have belied that theory, but we [in computer science] haven’t.

I don’t necessarily think this perception that programming doesn’t seem to have a purpose behind obsessively sitting in front of a computer all day is exclusive to women. Even for those who’ve pursued a degree in computer science, it’s not clear how programming has relevance that is not an end to itself.

Check out this 2008 Slashdot thread, in which a recent computer science undergrad asks for suggestions of “Non-Programming Jobs for a Computer Science Major?” because he can’t think of ways to use computational thinking that doesn’t directly involve code. Or more recently, this screed by a NYU journalism professor, who sees coding as a trend du jour, little more than a pointless struggle to learn more code before a new language becomes hot and makes you obsolete.

I can’t claim to have insight myself, because when I left college with a computer engineering degree, I had no idea how to use it except to be a computer engineer, which I didn’t want to be, so I ditched it entirely at my first journalism job. Years later, I’ve slowly learned how to use programming to, well, practice journalism’s core function of interpreting and disseminating information. However, I attribute this to how much our world has become digitized with far fewer bottlenecks in applying computational thinking. So now it seems much more obvious that computer science can be as directly relevant to general society as medicine and ecology.

Non-scientists often assume that all scientists, and similarly left-brained people, can equally grok the concepts of programming. But this is as wrong an assumption as thinking that any programmer can easily pass the MCATs. Within the field of biological research, for example, there’s a difference of roles for biologists who can program and those who cannot.

The two fields of research are described as “wet-lab” and “dry-lab” work. In a recent issue of Nature, Roberta Kwok writes about how “biologists frustrated with wet-lab work can find rewards in a move to computational research“:

During her master’s programme in genetics from 2005 to 2008, Sarah Hird dreaded going into the lab. She was studying subspecies of red-tailed chipmunks and had become discouraged and frustrated by the uncertainties of molecular-biology experiments. She spent six weeks trying to amplify repetitive sequences in chipmunk DNA as part of an experiment to identify genetic differences between populations — but to no avail. Hird tried replacing reagents, switching to a different machine for running the polymerase chain reaction and decontaminating the sample-preparation area. Nothing worked. And then, for reasons that she never quite deciphered, the technique suddenly started working again.

By the end of her master’s, Hird had come to dislike working in a wet lab, and she decided not to apply for PhD programmes.

About six months after finishing her master’s degree, while working as a part-time technician at Louisiana State University in Baton Rouge, she discovered a better direction. The lab’s principal investigator had suggested that she learn a computer-programming language so that she could help with a simulation project. Hird, who had never programmed before, taught herself the language using a book and online tutorials, and quickly became engrossed.

“Once I started, it was like an addiction,” she says. She enjoyed developing algorithms, and she found the software-debugging process less frustrating than troubleshooting wet-lab problems. The work felt more under her control.

Later in her article, Kwok interviews a German biologist at the Max Planck Institute who offers this insight:

He notes that newcomers may stay more motivated if they can apply computational skills to real scientific problems rather than to the ‘toy’ exercises in a computer-science class. For example, a researcher who works with many image files could write a program to automatically perform processing steps, such as contrast enhancement, on thousands of images.

If young students – male or female – are turned off at the prospect of learning computer science, it’s not enough to just have role models. The usefulness of computational thinking are far too broad for just that. Why should only dedicated computer scientists benefit from the techniques and theory of programming, as if the importance of writing should only be left up to published writers?

Ideally the importance of computational thinking would be part of the general curriculum, and not just as a separate programming class, but integrated in the same way that you must read and write and perform calculations in your biology, physics, and economics class. But while we wait for that change to come about eventually – if at all – those of us in the field can help to increase diversity in computer science by increasing the visibility of computer science’s diverse impacts and applications.

After Allen complains about computer science’s too-narrow scope, Seibel simply asks, “So why do you like it?” She responds:

Part of it is that there’s the potential for new ideas every day. One sees something, and says, “Oh, that’s new.” The whole field gets refreshed very frequently. It’s very exciting to think about what the potential for all of this is and the impacts it can have.

Isaac Asimov made a statement about the future of computers-I don’t know whether it’s right or not-that what they’ll do is make every one of us much more creative. Computing would launch the age of creativity. One sees some of that happening-particularly in media. Kids are doing things they weren’t able to do before-make movies, create pictures.

We tend to think of creativity as a special gift that a person has, just as being able to read and write were special gifts in the Dark Ages-things only a few people were able to do. I found the idea that computers are the enablers of creativity very inspiring.

There’s a lot of other great stuff and stories in Allen’s interview, including her attempt to teach Fortran to IBM scientists, the need for compiler optimization in the age of petaflop-speed computing, and how other women in the industry, including one “who essentially was the inventor of multiprogramming”, have been robbed of their achievements. Read the rest of Allen’s interview, and 14 other equally great interviews with coders, in Seibel’s book, Coders at Work.

Peter Norvig on cleverness and having enough data

I’m reposting this entry, near verbatim, from The Blog of Justice, which picks out a keen quote from Google/Stanford genius, Peter Norvig, from his Google Tech Talk on August 9, 2011.

“And it was fun looking at the comments, because you’d see things like ‘well, I’m throwing in this naive Bayes now, but I’m gonna come back and fix it it up and come up with something better later.’ And the comment would be from 2006. [laughter] And I think what that says is, when you have enough data, sometimes, you don’t have to be too clever about coming up with the best algorithm.”

I think about this insight more times than I’d like to admit, in those frequent situations where you end up spending more time on a clever, graceful solution because you look down on the banal work of finding and gathering data (or, in the classical pre-computer world, fact-finding and research).

But I also think about it in the context of people who are clever, but don’t have enough data to justify a “big data” solution. There’s an unfortunate tendency among these non-tech-savvy types to think that, once someone tells them how to use a magical computer program, they’ll be able to finish their work.

The flaw here is, well, if you don’t have enough data (i.e. more than a few thousand data-points or observations), then no computer program will help you find any worthwhile insight. But what’s more of a tragedy is that, since the datasets involved here are small, these clever people could’ve done their work just fine without waiting for the computerized solution.

So yes, having lots of data can make up for a lack of cleverness, because computers are great at data processing. But if you’re in the opposite situation – a clever person with not a lot of data – don’t overlook your cleverness.

W3rules – a modern HTML and CSS reference

I’m a passable web developer, but one who still needs to constantly Google things like “css box shadow” because I don’t do enough front end design to justify memorizing all of the syntax. Problem is, doing a websearch for HTML/CSS terms returns entire pages of not-quite-up-to-par links to W3schools, a site that has dominated web dev references in search engine results since I first started coding.

Lots of failed attempts have been made to displace w3schools. So I won’t aspire to that. I just want a site, even if I’m the only user, where I can refresh my aging mind on the vagaries of CSS syntax.

I call it: W3Rules. Here’s a sample page for font-family. This will be a good chance to get more familiar with Middleman, which looks very fun to use.