2009/10/26

Knuth on addiction

Donald Knuth, Adventure (PDF):

Clearly the game was potentially addictive, so I forced myself to stop playing — reasoning that it was great fun, sure, but traditional computer science research is great fun too, possibly even more so.

2009/10/22

The nature of procrastination

When your mind is distracted to the point of not starting an impending task.

That’s not quite it.

The adage is that after being interrupted it takes fifteen minutes to get back into the “flow” of working.

The first fifteen minutes, then, is crucial. Before even starting something. If your mind slides off track at any point before the fifteen minutes is up, gotta start again.

But how often does my mental trigger kick in to read email, refresh feeds, check newsgroups (and, in the past, Twitter and Facebook and New York Times and …) ? In fact, I still get mental triggers to visit news sites I haven’t read in years. Which scares me, frankly. What sort of rut did I get myself into that my brain still brings it up multiple times every single day even if I never (in years, now) respond to it? Is that the side-effect of the addictions of youth? When will it go away?

(To see if you have such mental triggers of your own, close all the windows on your screen and open up a fresh and blank browser window and try and think of nothing. The first thing that pops into your head to type into the address bar?)

Worse than all of the above, how easy is it to slide from the hard tasks of involved research writing/reading/coding to the easy tasks of fixing bugs or renaming variables in my latest toy project? Especially if you can trick yourself into thinking that your toy project can substitute for your real work.

Sometimes, I even get a mental trigger to write things on the internet, things which people already know and which don’t help either the people reading them or person writing them.

Unlike Merlin Mann. (I linked him before, but no hard in repetition.) He says everything on the topic much better than I’ll ever be able to: (albeit this time in an uncharacteristically difficult-to-quote way)

…developing those invaluable tolerances [to “stick with [your work] at the time you’re most tempted to run away”] requires the exercise of some very small muscles. The muscles are super-hard to locate, and once you do find them, they hurt like a bitch to exercise.

Ain’t that the depressing truth.

Well, I’m off to do the dishes. And then get back to work.

2009/10/12

Matlab vectorisation

Wikipedia tells me that Donald Knuth said:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

(I figure you don’t need a citation when you mention “Wikipedia” in the sentence.)

Back in the old days, the fast way to do things in Matlab was to use “vectorised” code which operated on entire array rather than the individual elements; loops were the devil. More recently (2003-ish), Matlab gained a just-in-time compiler, eliminating the old bottleneck. (Update: Thanks, Ben, for pointing out my mistake there; not sure what I was thinking when I wrote 2007. Perhaps I didn’t get access to Matlab-with-JIT until some time later; I forget.)

But you still sometimes see advice to use vectorised code whenever possible. In short, this is a bad idea on performance grounds alone.

For example, the above-linked advice gave the trivial example:

% Extremely slow:
for i = 1:length(x)
   x(i) = 2*x(i);
end

% Extremely fast:
x = 2*x;

Interested, I tested this out.

[Update: Ha, “I tested this out” completely incorrectly, because I was hasty and hadn’t used Matlab in a while. So don’t mind me on that particular point. However, the following still stands:]

Your rule of thumb should be: write the code that makes the most sense when you’re writing it. If it’s slow, try and fix it then. Vectorised code can get damned hard to write and harder to read. It’s only worth it if it saves you real time running the code. And I’m talking hours and hours of time difference here.

When you write x=2*x, you should do so simply because that’s the clear logical representation of the operation “multiply each element of x by two”. But just because you use vectorised code here doesn’t mean you always should.

GitHub from the command line

Here are a couple of shell aliases that I’ve found useful recently. To use, add them to your .bash_profile file. All of these commands are intended to be run from the working directory of a Git repository that is connected to GitHub.

  • Used in the following, more useful, commands, this returns the GitHub identifier for a repository: (E.g., wspr/thesis)

    alias githubname='git config -l | grep '\''remote.origin.url'\'' | sed -En   '\''s/remote.origin.url=git(@|:\/\/)github.com(:|\/)(.+)\/(.+).git/\3\/\4/p'\'''
    
  • On Mac OS X, this opens your GitHub project in the default browser: (I’m guessing it needs some adjustment to work under Linux.)

    alias github='open https://github.com/`githubname`'
    
  • Similarly, this one opens up the Issues page:

    alias issues='open https://github.com/`githubname`/issues'
    
  • Finally, this one returns the number of open issues in the GitHub project:

    alias countissues='curl -s http://github.com/api/v2/yaml/issues/list/`githubname`/open -- | grep '\''\- number:'\'' | wc -l'
    

Via the GitHub Issue API, it’s possible to extract all sorts of useful information programmatically that could come in handy for project management. Use the output of this URL to get all the juicy info:

http://github.com/api/v2/yaml/issues/list/`githubname`/open

For example, I’d like to write a script to report summary details of all open issues across all of my projects/repositories. Saving it up for a rainy day.

It would also be interesting to write a script to run before pushing that checks which issues you’re closing (via the closes #xyz commit log interface) and shows a brief summary for confirmation before sending off the commits. That’s for a rainy weekend.