Take a deep breath. First of all, don't be intimidated by the size of the code. It really doesn't matter. The whole program may have a million lines, but you can be sure it's not actually doing a million things when it tries to run your specific command. Think of the million lines of code as locations on an extremely detailed map. If you want to find a street in Austin, you won't get very far by carefully reading every street name on the whole Austin map. Instead, you want to narrowly focus on the specific path your program takes when you execute your command.
In a nutshell, you are a detective. A crime has been committed: a bug has murdered the effective execution of your program. You know that the perpetrator went from point A (the beginning of the program) to point B (the place where the program is misbehaving). Your first task is to follow a trail of clues and find the suspect. After you know where the error occurred, the last part of your job -- move the suspect into custody by making your program do what it's supposed to -- is relatively easy.
Let me talk math for a minute. I'm going to solve a puzzle live in this post. I have a large number: let's pick 987,654. I want to know what is the square root of this number to the nearest integer, but I'm not allowed to just hit the square root buttonon my calculator. Instead, I can only pick a number and multiply it to see if my answer is right. How will I find the answer?
I'll indent my solution in the next several paragraphs so you can see where the rest of the post continues. Think of how you might solve it before reading on.
We could brute-force it with trial and error. 1*1 = 1. Nope. 2*2 = 4. Nope. 3*3 = 9. Nope. And so on. This could take a while.Let's be smarter about this. We want to get CLOSE to the number and then home in from there. So let's start with an educated guess. About what order of magnitude is my number? 987654 is pretty close to a million, and it's easy to figure out the square root of 1000000: it's 1000. (Just cut the number of decimal places in half.) So we know the answer is less than 1000.Is it 900? 900*900 = 810,000. Too small. Let's pick a bigger number. Go halfway up, we get 950.950*950 = 902,500. Still too small. Let's try 990.990*990 = 980100. Hey, we're getting closer! But it's still too small. How about 995?995*995 = 990025. Now it's too big, but just a little bit.At this point we know that it's between 990 and 995, so let's just step backwards from 995 until we find it.994*994 = 988036. Too big.993*993 = 986049. Too small.Ah ha! We've found the answer! We know that the square root of 987654 is more than 993 and less than 994. In other words, it's 993 point something. If we wanted to, we could keep playing this game to guess more places after the decimal. But I said that we'd only go to the nearest integer, so "993 point something" is good enough.Let me check with the calculator now that we've settled on an answer:sqrt(987654) = 993.808. See? I was right.
I've just demonstrated something very much like Newton's Method of approximation. Instead of brute forcing the solution by walking through every possible number, we started in a likely spot and quickly converged on an answer. Now, what insight does this give into debugging?
Figure out a likely entry point for your code. For instance, if your program errors out when you click the button that says "Display all prices" then search all your files for the exact string "Display all prices". If it only exists in one place, you've got a location to start looking, and you're off.
Quick sanity check. Once you've found this spot in the code, you might want to temporarily change the text to "Display all prices!!!" Then reload the program and see if the exclamation points show up. I can't tell you how often I've found what I thought was the right spot in the code, but then wasted a bunch of time wondering "Why didn't THIS change do anything?" when I'm actually not in the right spot at all. This hearkens back to the important principle from yesterday's post: Programming is both theoretical and experimental. Don't just trust your guesses to be right, do the experiment!
So you know where your code started, now where is it going? Off the top of my head, there are three important tools you have for narrowing down the location of your bug.
- Use a debugger. Decent program development environments all have a debugger. Learn how yours works, it is your best friend. With the debugger, you can step through your code a line at a time, see exactly where it's going, spot check your variables, and trace back where you've been. However, some types of development don't make debugging easy. For instance, if you're developing web pages, the program runs on your browser and not in your editing environment. Sometimes code can load an unrelated program and you have to go to a different tool. So if you have no debugger or you run out of use for it, you have to go to your backup plan:
- Print statements. Lots of print statements. Print what you're doing: "X was set to 15!" or "Entering function..." "leaving function...". Just remember that debug statements in shipped code look really bad, so don't fail to clean up all your print statements when you're done. A good habit is to put a distinctive string in front of all your prints to make sure you don't miss any. For instance, sometimes I will make debugging statements like
print "RG Entering function foo";
The string "RG " rarely shows up in code, so I can search for all occurrences and delete them when I've fixed my problem. (Some languages use "ARGV", but if you include the space after "RG " then this won't show up in the search.) If you are willing to put in a little extra time, the preferred solution is to write a "printDebug" routine that only prints when a flag is set, so that you can globally turn off all debug messages if necessary. This isn't always worth the effort, though, when working across multiple files that don't share libraries. - Comment out large blocks of code. Just remove them entirely. Don't underestimate this technique. If your problem is that the program is slow, and you comment out an entire function, and it STILL takes five minutes, you immediately know "Guess that function isn't causing the slowness!" Then you don't have to waste your time looking there.
A word about print statements: It's not necessarily that easy. Sometimes you will add a "printf" command (C) or "cout" (C++) or "System.out.println" (Java) and you see nothing at all. "Print" in this case means "do whatever you can to make it visible." If you're coding to a web page, write to the web page's output stream. If you're writing Javascript, use "alert" statements to pop up a window. If your program uses a log file, write to the log. If you're writing a windowed program, you might want to make a special panel that you can use to display debug messages. Just figure out what makes the most sense to make your messages visible.
Basically the objective here is to narrow down all the possible locations where the bug might be. If your program crashed, where did it crash? If it was supposed to display a picture of a flower and didn't, did it even reach the "drawFlower()" function? If so, why is drawFlower broken? If not, where did it make a wrong turn?
So you're a detective, casting a wide search net at first but tightening the net. From the million lines of code, we've found that the error must be happening in this 1,000 lines. Then we cut it down to 100 lines, then 10, then 1: this exact line is where it misbehaved.
As you tighten your search net, you will dig deeper into the code. If you get down to one line and discover that it's a function you control, you're not done yet: you have to step into that function and keep going. For instance, I comment out a single line and discover that the Bad Thing no longer happens. I uncomment the line, step into the function, and comment out the entire function body. Same result. Good, my guess is confirmed. Now uncomment half the body. Now does it still do the Bad Thing? Is it getting inside this "if" block, or the "else" block? Why did it go here and not there? What values is it seeing when this decision is made?
Ultimately, fixing a bug in a million lines of code often comes down to changing one line. So finding where the bug is, is actually 99% of the work. Hence, this is probably the single most important skill you can develop.