Instructional Code and Modelling Code

Styles of coding interest me enormously and it’s many years since I accepted the notion that

Any fool can write code that a computer can understand. Good programmers write code that humans can understand. ~Martin Fowler

Writing code is most definitely about writing for other people to read, whether other members of your team or people somewhere down the line, maybe long after you’ve moved on. But the number of languages that we have available for coding is just one sign that there isn’t a ‘best’ way of doing everything, just many different trade-offs that fit better or worse for any particular task. What is clear, though, is that languages have evolved since punch cards.

We have procedural languages (C, Pascal), functional languages (Erlang, Haskell, XSLT), object-oriented languages (C++, Objective-C, Java, C#) and logic languages (Prolog) to name several paradigms, but by no means all. People often talk about C still being the best tool for writing performant code as it remains the closest to the underlying hardware – C also remains in common use for work in embedded applications.

So why all the different types of language? The main reason as I see it is that they represent different ways of thinking about problems, they allow you to describe solutions in ways that match your model of thinking. In OO languages this approach has been taken to some heights through the use of design patterns – documented ways of combining objects to solve a problem in a particular way of thinking. In other types of languages there also idiomatic practices and norms that match the way that language community think. Ever wondered why OO Perl never really got popular? It’s because most people writing Perl don’t think about solving problems in terms of objects.

Different languages can take this to extremes in different ways. Brainfuck, for example, presents a model of a sequence of bytes and a data pointer. Solving problems with only a byte array and increment/decrement functions is, in this case, deliberately obtuse; and the syntax of the language has been chosen to make it even more so.

In my mind I’ve always separated out ‘higher level’ languages from, say, assembler. But, of course, they’re really just different ways of thinking, ways of modelling a solution. We often talk about the evolution of languages and speak of more modern languages as being better than older languages as if in some Darwinian competition. Taken outside of the context of time, though, they simply form a series of different models. Sure some of them build upon the concepts of others, but not always in ways that improve upon them. Assembler is not less a modelling language than Java or Smalltalk, it just so happens that the modelling you do in assembler uses the same conceptual model as the underlying hardware of the machine.

Recognising how best to describe a solution (in code) so that future readers can understand clearly what is happening seems to be at the core of skills programmers need to hone and different models will be applicable at different times. For example, Repenning describes the inappropriate use a “naive” object model in his paper Collaborative Diffusion: Programming Antiobjects (pdf, 2.7MB). Introducing the term Antiobjects, he talks about how responsibilities can be distributed differently to initially obvious approaches, gaining a much more efficient running system as well as a simpler implementation.

The modelling aspects of languages provide ways to group or separate different concerns, but underlying it all is the need to do something. There’s a reason that BASIC stands for Beginners All-purpose Symbolic Instruction Code. I recently came across some code that had forgotten this. The author had applied the Observer pattern perfectly, objects that did the work subscribed to another object that monitored the file system for changes. The main() method simply constructed the objects, wired them together and said “Go!”. The net effect of this, however, was that anyone coming to the code had to form a complete mental model of how all the objects were going to interact before they could predict the sequence of things that would get done. The pattern, while elegant and properly implemented, made the code harder to understand. What I needed, as a reader, was a simple sequence of instructions – the flow of the application.

So, as my eight year old starts to ask if he can learn how to make the computer do stuff I’m wondering if he should start with modelling code first or instructional code first – or if the distinction is even valid. Maybe I should start him off with Flash…

Twitter Updates for 2008-09-03

  • back from lunch with David Peterson and new team member Mark, burger and a pint, not too bad. #
  • caching framework for rdf that keeps track of dependencies and invalidates the cache when you make changes – very pleased with today’s work. #
  • @iand scalded_mouths– in reply to iand #
  • well, Chrome Beta, running in XP under Vmware on a mac is noticeably faster than FF3 running natively. /me likes v.much #
  • @infod1va photo is set to private :-( in reply to infod1va #
  • @iand or maybe that my broadband at home is faster than the pipe at the office? in reply to iand #
  • @edsu v8 is the JavaScript compiler, yes – fast by the looks of it, gmail very responsive for me in reply to edsu #
  • right, going to bed now, to read ‘The Book Thief’ #
  • in the office, about to get a cup of tea, then wrap cacheing functions in unit tests #
  • @ianibbo it’s written in a test-first style, just without the tests first… ;-) in reply to ianibbo #
  • @PaulMiller and I’m fed up of LiveWriter because it just has to adorn your markup with it’s name, let me know what you find in reply to PaulMiller #
  • @blisspix your skepticism may prove to be well placed, but you should try it – it’s blisteringly fast in reply to blisspix #

Twitter Updates for 2008-09-02

What's in a name?

RePEc, Research Papers in Economics, have been doing some interesting work on allowing the author community to self-specify variant forms of their name.

This is interesting, the example they give is of John Maynard Keynes:

For John Maynard Keynes (who is not registered), such name variations could be:

John Maynard Keynes
John M. Keynes
John Keynes
J. M. Keynes
J. Keynes
Keynes, John Maynard
Keynes, John M.
Keynes, John
Keynes, J. M.
Keynes, J.

Unfortunately Keynes has not registered these himself, I’d love to see some examples from the 16,000 authors who have registered.

This data comes from published papers as well as monograph data, so the variance is a lot higher, I suspect, than occurs solely within libraries and the absence of years of birth (and death) makes it highly ambiguous. RePEc have published a list of author name homonyms. Even within published papers on economics this is a reasonably problematic list.

I’ve been working on very similar issues with library data, trying to analyze the meaning in marc data.

One of the key things for me was developing techniques for dealing with ambiguity, allowing outside systems with limited data to receive the most specific answer possible, while giving less specific answers for less specific matches.