Thursday, February 26, 2009

Coding is both theoretical and experimental

"As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs."
  - Maurice V. Wilkes, inventor of microprogramming

My father is a computational physicist.  That's a job description that didn't exist when he started working a few decades ago, but he made the decision to transition into that role.

In the past, scientists have generally fallen into two broad categories: theoretical, and experimental.  Theoretical science is distantly related to pure math.  You have an arsenal of known scientific facts, you know the equations, you analyze those equations and learn new consequences that result from them.  Experimental science is more directly involved with measuring and testing the real world.  You come up with ideas, you do something to test if your ideas match reality, and if they do, you've got a theory.  (Ultra-simplified version.)

A computational scientist is a different beast altogether.  A computational scientist can create models of reality using a computer simulation to represent theoretical information.  Then he can run the simulation with different parameters, finding out what would "really" happen if you changed the initial configuration.

When we deal with things like planets, we pretty much know how they behave in general, but you can't say "I wonder what would happen if we put three planets at this distance from each other" because there is no practical way to set up an experiment by moving planets around.  Instead, your computer can simulate what would happen, and you might learn some things from the simulation that weren't obvious just by running the equations.

Programming in general is like that: it's both theoretical and experimental.  You can eyeball your code, trace the logic by hand, and figure out what it should do.  That's theory.  Then you can run the code, either logging the output or stepping through it with effective debugging tools, and find out what it really does.  That's experiment.

In decades long past, computer programs ran on expensive mainframe terminals which could take hours to complete a single execution via, for example, punch cards that were hand-fed to the computer.  Because terminal time was at a premium, writing a program with any bugs could be a costly mistake.  It was more economical to spend hours poring over the punch cards or machine code, making dead certain that the program would run correctly before you fed it to a computer.  In other words, programmer time was cheap and computation time was expensive.

Today, the reverse is true.  Rather than hours, it takes seconds to compile and run a computer program.  Programmers are still free to spend their time making sure that the code "should" run before they run it.  However, in many cases it is easier and smarter just to run it and see if something goes wrong.  Following Moore's Law, computer time has become incredibly cheap.

A smart programmer takes advantage of this and does a mixture of both theory and experiment.  Rather than write an entire program from scratch on paper, then running it and hoping for the best, today's programmers should view a program as a series of components, to be built and tested bit by bit, making sure that each part behaves properly through real demonstration.  Unless I'm in the middle of a major rewrite that deliberately broke old code, I will rarely leave my program in an uncompilable state for more than fifteen minutes, and usually a lot less.  Each change can be tested to make sure that it has predictable behavior.

In an optimal testing environment there are no computer errors; logic errors come from the developers.  That's not a failing on the part of the developer; to err is human, and the point of this style of coding is to recognize and anticipate those errors.  The benefit of working this way is that if you make a mistake, you get immediate feedback that something has gone wrong.

If you write a large number of changes and then discover an error after running them all at once, you are temporarily stuck.  You have no way of knowing which of your changes was responsible, so you have no choice but to backtrack and break your code into smaller fragments to isolate the problem... which is exactly how you should have been writing the program in the first place.

So how do you approach a problem that requires debugging?  I advocate an approach that uses something akin to the scientific method to systematically track down and destroy bugs.  I'll discuss this in a future post.


  1. "See what goes wrong" is certainly nice when the compiler can just quickly tell you that you forgot your semicolon, but it's much trickier when you hit a logical error and the program still runs.

    Worse yet are logical mistakes regarding performance where everything ultimately works correctly, just slowly. It wasn't easy tracking down a rogue SQL query eating up a lot of server time I recently had on my site.

  2. When I talk about something "going wrong," I'm not only talking about a compilation error, but about being aware of what each piece is supposed to do in all conditions and testing for those things early.

    For instance, if you have a mathematical function that works on a numerical input, you should immediately test the function at the extreme cases: does it work for the highest possible value? the lowest possible value? Does it return the correct error for a value outside the expected range?

    If it's reading an array: How does it perform on really large arrays? Empty arrays? Does it make a difference if the arrays contain randomized numbers or sequential numbers?

    In the case of the SQL query, you should already be thinking about what could make it run slowly as you're writing it. But I also agree, of course, that you won't catch everything before the fact and you should prepare things to make debugging easy.

  3. Yup, not disagreeing with any of that. It goes right back to your main point, it's experimental and theoretical.

  4. Interesting topic...
    My .02,

    Speaking from a QA perspective, a good approach is define as many expected failure modes for a piece of code as you can. Typically, once you've defined in what ways your code could fail, you've already gotten a leg over on where you can start looking for the problem.

    This can be a very gross analysis at a pretty high level - say the building and execution of an app while running a test case. 4 general failure modes come to mind:
    - fails to compile (narrows problem to source code)
    - fails to link (narrows problem to source code and/or other binaries in project, build system)
    - crashes at run time (still generally in domain of coding error)
    - runs to completion but exhibits unexpected behavior (points probably to logic errors rather than the more mundane coding errors).

    You can also think do failure mode analysis on smaller scales, such as using arrays like in Russell's example. This helps you figure out how to stress test or debug problems.
    I.e. arrays exhibit crashing type failure modes when:

    - you overrun the array boundary
    - the memory allocation for the array fails. If it's a stack-based array, just making it too large can cause a stack overrun. Heap-based (i.e. globally defined or defined via new/malloc()) can run out of memory too, but allocation failures here can still happen.

    So, this gives you a good preliminary idea of a set of conditions to check for:
    - any logic conditions in the code that can result in an array index < 0 or > arraysize-1?
    - any situations where very large array definitions occur on the stack (i.e. locally to a function)?
    - any very large arrays passed by value into functions?

    Logic errors occur when the wrong element is accessed, or an element that's allocated but not initialized is accessed. This can give you a strategy for what to look for in your code - is there anything apparent that could result in these conditions?

    And so on.

    But this is also an experimental thing too as you can't anticipate every possibility. In the end, you got to just let er rip and see what happens....