Castles of Air: debugging

Showing posts with label debugging. Show all posts

Monday, August 1, 2011

On learning to love your users, who are idiots

In the process of discussing error checking in the comments section of a recent post, I flippantly remarked that "all users are idiots." This is a sentiment that I expect all veteran programmers will have encountered or stated themselves on some occasion.

From a new programmer struggling with this problem, I hear this: "This would be a heck of a lot easier if users just weren't idiots." And of course, that is something we all wish. But then again, if we didn't have to assume user idiocy, we wouldn't be writing programs.

There's a fundamental difference between constructing a program and writing a novel or painting a picture. For static forms of media, you only have to create one thing from beginning to end. Once you've finished writing your book, it's over. For better or for worse, your characters have finished interacting with each other. They've said what they have to say. People will either like it or not like it; they may debate endlessly about what your words or images "really mean," but they can't affect its behavior.

Unfortunately, programs aren't like that. Once your release your program into the wild, users get to do whatever they want with it. And if they do something utterly crazy and your program breaks, they'll blame you.

Murphy's Law ("Whatever can go wrong, will") was written by an engineer in the 19th century, but it is used most by software engineers. It's not that everything possible will go wrong every single time your program is run. It's just that if you have millions of users (which you will if you are successful) then even a very small chance that one person will do something unexpected, must necessarily magnify into a virtual certainty that somebody, somewhere will find a way to blow up your program.

With that in mind, writing a program is just as much about covering every possible angle of what some idiot user might do to you, as it is about creating a pleasing presentation when the program is Used As Intended.

As a gamer, I have heard several interviews with voice actors who are veterans of film or television, but new to performing voices for video games. The universal sentiment seems to be "This is the craziest thing I've ever had to do. You have to perform a dozen different lines for every single scene, and they're not just different takes for the editor to select from. They're ALL USED in the game. I'll have to perform a scene featuring my dramatic death, and in the very next scene I'm alive again. I'll have to answer the same question five different ways, just in case the player decides to harass me by asking my character over and over again." And so on.

So, because we must cover every possible use of the program, we create elaborate error scenarios and use all kinds of tricks to keep the user on track. One way to handle user error is to simply give the user very strict instructions, like this: "You MUST TYPE A NUMBER from 1-100, and if you do ANYTHING ELSE, then the program WILL CRASH and that is YOUR RESPONSIBILITY." That's not very satisfying, though. People make mistakes, even people who are not idiots. It's much nicer to recover gracefully from an error. At every step, you're asking yourself: "What can go wrong here?" And then you add a clause to your program: "If bad thing xyz occurs, say something polite to guide the user back on track, and try it again."

Often, an even better solution is to tie the user's hands so he can't actually make the mistake in the first place. For example, slider bars and drop boxes exist exactly so that the user can pick an integer from 1-100 without the capability to do something stupid.

Obviously, it takes a lot of work to write a bulletproof program, and the more thoroughly you prepare for bad user behavior, the harder the work is going to be. Often you have to strike a happy medium. The smaller your audience is, the less effort you have to put into mistrusting your users. If you are just writing a program for yourself, it's probably less work to try and give good input than to write error handling routines.

Also, the more general the application, the more you have to just let the errors happen, and your only responsibility becomes to make sure the errors don't cause a crash. For instance, if you're writing a calculator program, you can't just stop all operations that divide by zero. The user might just DECIDE to divide a number by zero. You just have to tell him he made an illegal operation, and move on to the next step.

When you write a program, you're not designing a static thing for people to look at. You're designing a universe of possibilities to cover all uses. The more time you spend thinking about what an idiot might do, the better you can guarantee that you will make it a pleasant experience for those who are not idiots.

Thursday, July 7, 2011

Adventures in junior programming, episode 3

Haven't been posting on this blog much lately. Since I last posted I got a new full time job at a high tech marketing firm, and also taught myself the fundamentals of both Android and iPhone programming. Interesting stuff, probably good fodder for a future post. But for now, I'd like to go over recent efforts to teach my son more programming.

We let it slip during the school year but we've picked it up again over the summer. We're doing about an hour of work several nights a week. It seems like teaching a nine year old to program irregularly is like pushing the stone of Sisyphus, since every time we take it up again, I have to re-teach concepts that seemed to stick the last time but didn't. Nevertheless, my feeling about programming has always been that you have to learn to "think like a programmer" first, and once you have burned this style into your memory, it becomes much easier to pick up any language or platform in existence.

So what is the core of thinking like a programmer? As far as I can sum up for a kid, it boils down to a few key elements which need to be practiced over and over under a variety of conditions:

Output
Tinkering with variables
Input
Loops
Conditionals
Logical operators
Functions and classes, which fall broadly under the category of "splitting up the work into manageable smaller chunks."

I've fallen into sort of a systematic bootstrapping pattern of teaching, which goes something like this. I have milestones in mind for things Ben should know how to do well, and they should be second nature. The current milestone is: "Write a Python program that counts to ten." Once he can do this without hesitation and without complaining that he needs more hints, we'll bump it up and move on to another milestone.

Each session, I ask him to hit the milestone. If he can't do it well enough, I'll point out what he's missing and go over how it works again. After we get through this, if it's appropriate, I'll introduce a new concept, which we'll drill in the future until that becomes the new milestone.

I've also been making him follow the program logic step by step. It's not as intuitive as you might think. For instance, I ask him to explain HOW the computer goes through the stages of counting to ten, and this is the kind of response I'm looking for:

"First I set the counter to one. One is less than ten, so I enter the loop. I print the counter value. Then I add one to the counter, making it two. Now I return to the beginning of the loop. Two is less than ten, so I enter the loop..."

The kind of response I get takes frequent shortcuts, like "I keep doing that until it's ten." That may be technically correct, but it doesn't capture the essence of thinking through every step, which is critical to catching bugs. If you wind up writing a program that only counts to nine, or goes into an infinite loop, you can keep modifying lines until you get lucky, but if you can see what it's doing at every step, then you're less likely to make mistakes in the first place.

I don't even see the code. All I see is blonde, brunette, red-head...

Programming can seem tedious and repetitive, but at its best you get those "light bulb" moments when it's suddenly crystal clear what you want your program to be doing and how. And sometimes it's even more fun to write the code that solves a puzzle than to grind out the solution on your own.

Thursday, February 11, 2010

Angle math

I've got a new applet posted on my web page, which I wrote for my son Ben. He is seven. It is a demonstration of how angles work. You can drag the points of the triangle around, and it will continuously update the display of numbers showing what angle is formed at each point. It also gives a little readout showing that the three angles will, indeed, always add up to 180 degrees.

I like to do a little graphical application every once in a while just to stay in practice. There are a lot of concepts from trigonometry that have to be applied. Debugging graphics is sometimes a tricky affair, because if you don't do the right thing then you might wind up with nothing but a blank screen, or a line might appear wildly out of place. Finding the angles required remembering what sines and cotangents and such represent. (SOH CAH TOA! That's one that never leaves your memory, but I got sin and asin mixed up for a while.)

I also had to fudge the numbers a little bit. For instance, one angle might be 70.14 and another might be 50.34. The third angle should be 59.52. However, if you reduce those to one significant figure, you find that 70.1+50.3+59.5 = 179.9. So I had to fake the third angle (a3) as displaying 180-a1-a2.

The hardest challenge came after I decided to change "nearly" right angles into true right angles. Notice that if you drag one corner so that it forms an angle between 80 and 100 degrees, it will automatically snap to the correct position so that it is 90 degrees. I had to put some thought into making that work, and here's the solution I wound up with.

Write an expression of the line segment opposite the point being moved.
Find a vector perpendicular to that line.
Project the moved point along that vector to find where it intersects the opposite segment.
Find the actual distance that the point must be from the segment in order to make a 90 degree angle.
Move the point there.

I had originally used only a "Point" class and a "Triangle" class to represent the problem space; I soon realized I needed a "Vector" class as well (one which could be used to offset a point, normalized, reversed, or made perpendicular to another vector). Then I had to make yet another class, "Line," in order to properly calculate where intersections can be found. I borrowed a lot from this page, and found out just how long it had been since I needed to do that math -- I had a totally incorrect idea of how a line formula is expressed.

Basically the key to correcting your math mistakes -- and I made a lot! -- is to create either a printout or a visual representation at every step. For instance: I think I've normalized the vector correctly, better print out the coordinates and make sure the length is really 1. I need to draw an extra line to make sure it really goes through this point. I need to draw an extra point to make sure it really intersects that line. Do these two lines LOOK perpendicular to me? And so on.

If I had remembered the math better then some of this rigor wouldn't have been necessary. But really, caution and constant testing while coding eliminates the need to be a perfect math whiz. Coding is both theoretical and experimental, you see.

Friday, February 27, 2009

Search and destroy missions for your bugs

Pop quiz, hotshot. You're new on the job, you have a million lines of code that you've never looked at before, and there's a bug. It's doing something really bad, like taking five minutes to load and then displaying a blank screen. Your boss has some vague idea of what the program should show, or used to show, but it's not doing that. The most specific instruction you can get is "make it look like this" or worse, "just fix it." What do you do??

Take a deep breath. First of all, don't be intimidated by the size of the code. It really doesn't matter. The whole program may have a million lines, but you can be sure it's not actually doing a million things when it tries to run your specific command. Think of the million lines of code as locations on an extremely detailed map. If you want to find a street in Austin, you won't get very far by carefully reading every street name on the whole Austin map. Instead, you want to narrowly focus on the specific path your program takes when you execute your command.

In a nutshell, you are a detective. A crime has been committed: a bug has murdered the effective execution of your program. You know that the perpetrator went from point A (the beginning of the program) to point B (the place where the program is misbehaving). Your first task is to follow a trail of clues and find the suspect. After you know where the error occurred, the last part of your job -- move the suspect into custody by making your program do what it's supposed to -- is relatively easy.

Let me talk math for a minute. I'm going to solve a puzzle live in this post. I have a large number: let's pick 987,654. I want to know what is the square root of this number to the nearest integer, but I'm not allowed to just hit the square root buttonon my calculator. Instead, I can only pick a number and multiply it to see if my answer is right. How will I find the answer?

I'll indent my solution in the next several paragraphs so you can see where the rest of the post continues. Think of how you might solve it before reading on.

We could brute-force it with trial and error. 1*1 = 1. Nope. 2*2 = 4. Nope. 3*3 = 9. Nope. And so on. This could take a while.

Let's be smarter about this. We want to get CLOSE to the number and then home in from there. So let's start with an educated guess. About what order of magnitude is my number? 987654 is pretty close to a million, and it's easy to figure out the square root of 1000000: it's 1000. (Just cut the number of decimal places in half.) So we know the answer is less than 1000.

Is it 900? 900*900 = 810,000. Too small. Let's pick a bigger number. Go halfway up, we get 950.

950*950 = 902,500. Still too small. Let's try 990.

990*990 = 980100. Hey, we're getting closer! But it's still too small. How about 995?

995*995 = 990025. Now it's too big, but just a little bit.

At this point we know that it's between 990 and 995, so let's just step backwards from 995 until we find it.

994*994 = 988036. Too big.
993*993 = 986049. Too small.

Ah ha! We've found the answer! We know that the square root of 987654 is more than 993 and less than 994. In other words, it's 993 point something. If we wanted to, we could keep playing this game to guess more places after the decimal. But I said that we'd only go to the nearest integer, so "993 point something" is good enough.

Let me check with the calculator now that we've settled on an answer:
sqrt(987654) = 993.808. See? I was right.

I've just demonstrated something very much like Newton's Method of approximation. Instead of brute forcing the solution by walking through every possible number, we started in a likely spot and quickly converged on an answer. Now, what insight does this give into debugging?

Figure out a likely entry point for your code. For instance, if your program errors out when you click the button that says "Display all prices" then search all your files for the exact string "Display all prices". If it only exists in one place, you've got a location to start looking, and you're off.

Quick sanity check. Once you've found this spot in the code, you might want to temporarily change the text to "Display all prices!!!" Then reload the program and see if the exclamation points show up. I can't tell you how often I've found what I thought was the right spot in the code, but then wasted a bunch of time wondering "Why didn't THIS change do anything?" when I'm actually not in the right spot at all. This hearkens back to the important principle from yesterday's post: Programming is both theoretical and experimental. Don't just trust your guesses to be right, do the experiment!

So you know where your code started, now where is it going? Off the top of my head, there are three important tools you have for narrowing down the location of your bug.

Use a debugger. Decent program development environments all have a debugger. Learn how yours works, it is your best friend. With the debugger, you can step through your code a line at a time, see exactly where it's going, spot check your variables, and trace back where you've been. However, some types of development don't make debugging easy. For instance, if you're developing web pages, the program runs on your browser and not in your editing environment. Sometimes code can load an unrelated program and you have to go to a different tool. So if you have no debugger or you run out of use for it, you have to go to your backup plan:
Print statements. Lots of print statements. Print what you're doing: "X was set to 15!" or "Entering function..." "leaving function...". Just remember that debug statements in shipped code look really bad, so don't fail to clean up all your print statements when you're done. A good habit is to put a distinctive string in front of all your prints to make sure you don't miss any. For instance, sometimes I will make debugging statements like
print "RG Entering function foo";
The string "RG " rarely shows up in code, so I can search for all occurrences and delete them when I've fixed my problem. (Some languages use "ARGV", but if you include the space after "RG " then this won't show up in the search.) If you are willing to put in a little extra time, the preferred solution is to write a "printDebug" routine that only prints when a flag is set, so that you can globally turn off all debug messages if necessary. This isn't always worth the effort, though, when working across multiple files that don't share libraries.
Comment out large blocks of code. Just remove them entirely. Don't underestimate this technique. If your problem is that the program is slow, and you comment out an entire function, and it STILL takes five minutes, you immediately know "Guess that function isn't causing the slowness!" Then you don't have to waste your time looking there.

A word about print statements: It's not necessarily that easy. Sometimes you will add a "printf" command (C) or "cout" (C++) or "System.out.println" (Java) and you see nothing at all. "Print" in this case means "do whatever you can to make it visible." If you're coding to a web page, write to the web page's output stream. If you're writing Javascript, use "alert" statements to pop up a window. If your program uses a log file, write to the log. If you're writing a windowed program, you might want to make a special panel that you can use to display debug messages. Just figure out what makes the most sense to make your messages visible.

Basically the objective here is to narrow down all the possible locations where the bug might be. If your program crashed, where did it crash? If it was supposed to display a picture of a flower and didn't, did it even reach the "drawFlower()" function? If so, why is drawFlower broken? If not, where did it make a wrong turn?

So you're a detective, casting a wide search net at first but tightening the net. From the million lines of code, we've found that the error must be happening in this 1,000 lines. Then we cut it down to 100 lines, then 10, then 1: this exact line is where it misbehaved.

As you tighten your search net, you will dig deeper into the code. If you get down to one line and discover that it's a function you control, you're not done yet: you have to step into that function and keep going. For instance, I comment out a single line and discover that the Bad Thing no longer happens. I uncomment the line, step into the function, and comment out the entire function body. Same result. Good, my guess is confirmed. Now uncomment half the body. Now does it still do the Bad Thing? Is it getting inside this "if" block, or the "else" block? Why did it go here and not there? What values is it seeing when this decision is made?

Ultimately, fixing a bug in a million lines of code often comes down to changing one line. So finding where the bug is, is actually 99% of the work. Hence, this is probably the single most important skill you can develop.

Thursday, February 26, 2009

Coding is both theoretical and experimental

"As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs."
- Maurice V. Wilkes, inventor of microprogramming

My father is a computational physicist. That's a job description that didn't exist when he started working a few decades ago, but he made the decision to transition into that role.

In the past, scientists have generally fallen into two broad categories: theoretical, and experimental. Theoretical science is distantly related to pure math. You have an arsenal of known scientific facts, you know the equations, you analyze those equations and learn new consequences that result from them. Experimental science is more directly involved with measuring and testing the real world. You come up with ideas, you do something to test if your ideas match reality, and if they do, you've got a theory. (Ultra-simplified version.)

A computational scientist is a different beast altogether. A computational scientist can create models of reality using a computer simulation to represent theoretical information. Then he can run the simulation with different parameters, finding out what would "really" happen if you changed the initial configuration.

When we deal with things like planets, we pretty much know how they behave in general, but you can't say "I wonder what would happen if we put three planets at this distance from each other" because there is no practical way to set up an experiment by moving planets around. Instead, your computer can simulate what would happen, and you might learn some things from the simulation that weren't obvious just by running the equations.

Programming in general is like that: it's both theoretical and experimental. You can eyeball your code, trace the logic by hand, and figure out what it should do. That's theory. Then you can run the code, either logging the output or stepping through it with effective debugging tools, and find out what it really does. That's experiment.

In decades long past, computer programs ran on expensive mainframe terminals which could take hours to complete a single execution via, for example, punch cards that were hand-fed to the computer. Because terminal time was at a premium, writing a program with any bugs could be a costly mistake. It was more economical to spend hours poring over the punch cards or machine code, making dead certain that the program would run correctly before you fed it to a computer. In other words, programmer time was cheap and computation time was expensive.

Today, the reverse is true. Rather than hours, it takes seconds to compile and run a computer program. Programmers are still free to spend their time making sure that the code "should" run before they run it. However, in many cases it is easier and smarter just to run it and see if something goes wrong. Following Moore's Law, computer time has become incredibly cheap.

A smart programmer takes advantage of this and does a mixture of both theory and experiment. Rather than write an entire program from scratch on paper, then running it and hoping for the best, today's programmers should view a program as a series of components, to be built and tested bit by bit, making sure that each part behaves properly through real demonstration. Unless I'm in the middle of a major rewrite that deliberately broke old code, I will rarely leave my program in an uncompilable state for more than fifteen minutes, and usually a lot less. Each change can be tested to make sure that it has predictable behavior.

In an optimal testing environment there are no computer errors; logic errors come from the developers. That's not a failing on the part of the developer; to err is human, and the point of this style of coding is to recognize and anticipate those errors. The benefit of working this way is that if you make a mistake, you get immediate feedback that something has gone wrong.

If you write a large number of changes and then discover an error after running them all at once, you are temporarily stuck. You have no way of knowing which of your changes was responsible, so you have no choice but to backtrack and break your code into smaller fragments to isolate the problem... which is exactly how you should have been writing the program in the first place.

So how do you approach a problem that requires debugging? I advocate an approach that uses something akin to the scientific method to systematically track down and destroy bugs. I'll discuss this in a future post.

Castles of Air