Castles of Air: interviewing techniques

Wednesday, September 14, 2011

A bit about data structures

I wanted to write another HackMaster post, but what I wanted to write about was the way I approached deciphering the data in the tables and converting them into data structures. Then I skimmed through some older posts looking for reference points about data structures, and it occurred to me that I've never written any. In order to provide a foundation for the rest of the HackMaster breakdown, I'll have to digress and talk about structures in the abstract.

Whenever you are presented with a problem of modeling some numbers in conceptual space, the first thing you have to figure out before you write a single line of behavioral code is what kind of data structures you are going to use. Going all the way back to the beginning of this blog, I've emphasized the importance of considering the efficiency of your design and the effect that it has on the Big-O performance of your program. Thinking about proper data structures can buy you a lot of speed, and it can also make it really easy to visualize your program in small chunks as the complexity increases.

So what's a data structure? The first thing programmers learn is how to use variables for individual chunks of information, like this:

int x = 3;
String str = "Hello world.";

(Technically, of course, a String object in Java is a whole bunch of characters, which makes it a data structure in itself. But the nice thing about object-oriented programming is that you don't have to think about it if you want to.)

To understand data structures, consider an array. An array is one of the first slightly more advanced concepts that a beginning programmer will run into. Instead of storing just one integer, it can store several. For example, here's a simple representation of part of the fibonacci sequence:

int[] fib = new int[10];
fib[0] = 1;
fib[1] = 1;
fib[2] = 2;
fib[3] = 3;
fib[4] = 5;
fib[5] = 8;
fib[6] = 13;
fib[7] = 21;
fib[8] = 34;
fib[9] = 55;

When you create a single "int," you're asking the program to set aside a chunk of space in memory, large enough to hold one number. When you create an array like this, you're asking the program instead of set aside a bigger chunk of memory ten times that size, plus (for some languages) a little bit of extra information about size constraints and such.

But arrays can be wasteful. What if you want to set aside space that sometimes houses a hundred numbers, and sometimes houses just a few? You could create an array of size 100, but most of the time that space would be wasted. That's when you want to use a linked list, where you ask for new memory only at the moment that you actually need it.

I'm not dedicating this whole post to the implementation fundamentals of lists, but interested beginners should go check out the Wikipedia article to find out how this works. (Sidebar: While relying on Wikipedia for information about controversial topics is often unwise, most of the technical topics that are covered are really good.)

Besides linked lists, there are lots of other data structures that you can use depending on your situation:

A tree (which may or may not be binary) will hierarchically organize information for you, much like the folder structure on your computer does, shortening the search time as long as you know where you are going.
A hash table or map is a structure which will find a value associated with a key, usually very quickly. An example would be a dictionary search: you supply a word, and the program would retrieve a definition.

You can write your own versions of these structures, or if your language supports it, use predefined classes that create common structures.

Understanding what purpose the various structures serve, and when to use each one, is a very key skill in programming interviews. Often when you are asked "How would you solve this problem?" the best answer is not to blurt the first notion that comes into your head, but to start applying data structures to model the problem space: lists (or specifically, stacks or queues), trees (binary or otherwise), tables (sometimes you can just assume the existence of a database, which is centered around associative tables).

When I hear a problem that lends itself to this, I usually make a beeline to the whiteboard and start thinking out loud: "You're asking about a list of items, so let's describe what's in an item first... then build a linked list out of items..." Then I'll be either writing code to illustrate what I'm thinking, or (if the interview is shorter) just sketch out diagrams so that the interviewer understands the description and will probably accept that I know how to implement it.

Software is built a piece at a time. If you start explaining how you visualize the problem in your head, you can give a much better insight into how you think than if you just start solving the problem directly. In fact, if you start off strong with this approach but then go off on the wrong track, often the interviewer will be eager to guide you towards his concept of the solution because he's being carried along with your thought process. This often changes the dynamic of the interview entirely. Instead of being a room with an interrogator and a suspect, the interviewer may start thinking of himself as your ally and not your judge. And that's exactly where you want to be when you're looking for work.

Digression's over. Next time I'll illustrate this when I get back to decoding HackMaster tables.

Thursday, March 12, 2009

Big-O Notation 3: Showing off how you think

Some job interviewers ask you to describe your career history in great detail. Some want you to to explain the most obscure facets of the language you'll be working with. And some will give you a puzzle and ask you to design a program that solves it.

This last category is by far my favorite sort of opportunity in an interview. I'm a programmer because I love solving puzzles, so showing off that love is far more likely to make me look good than struggling under pressure to remember whether it's ArrayList or Vector that is thread-safe. (Tip: It's Vector.)

When you get asked a puzzle problem, remember above all else that the interviewer is much more keen to know how you think than whether you get the "right" answer. There may be some specific technique that they are hoping you know. But even if the answer is not immediately obvious, don't panic. If you sit there and explain your reasoning, refining your technique out loud, you can make up a lot of ground by showing that you can think on your feet.

A good example is the question I got from the Director of Engineering at Digital Motorworks. Not only did I get the job, but that portion of the interview went so smoothly that I wrote down the question, my answer, and his reaction to save for posterity. Here is a recap of the problem:

You're responsible for implementing a method called "sumExists." This method should read two arguments: an integer, and an array of integers. The return value will be a boolean, which is true if and only if there exists some combination of two distinct elements in the array that sum to the original integer.

For instance, suppose you call the method in this way:
sumExists(5, [7, 1, 2, 6, 4])
The array contains the elements 1 and 4, and 1+4 = 5, so the method should return true.

I immediately jumped to the most obvious solution -- not because I expected it to be right, but because I wanted to find some answer quickly and then improve it. I said:

"If you wanted to use brute force, you could just inspect every element of the array, and compare it to every other element of the array, and see if they sum to the first number. But that would require you to perform O(n²) operations, which is dumb. So I would like to find a better solution.

"Suppose that we sort the array. That would take O(n*log n) time (QuickSort, remember?) to complete. Where does that get us? Well, then we could quickly find the largest number and the smallest number of the array and compare them to each other. We'll put counters on the left and right ends of the array, and add those values. If they add up to the target, then we're done. (Note: in the example, the sorted array is [1, 2, 4, 6, 7]. The high and low numbers are 1 and 7, which add up to 8. Too high.)

"If the resulting number is too high, then the biggest number must be too big. So we'll move the right hand counter to the left and try adding a smaller number to the smallest. On the other hand, if the sum is too low, then the smallest number must be too small. So we'll move the left hand counter to the right and try again.

"By doing this we'll be able to test pairs of high and low numbers until either we find the sum we need, or the counters meet in the middle. If the sum is correct then we return true. Otherwise, return false.

"Scanning the list in this way requires only one pass, which takes O(n) time. But the sort was already longer, it takes O(n*log n) time. The time to scan a large array once would be much smaller than the time to sort, so we ignore that factor. This solution takes O(n*log n) time."

As I said, I wrote down the response and saved it. So this is more or less a direct quote: "I've been doing this for a long time, and there is not one person in a hundred who has given as thorough and correct an answer as you just did."

There were other difficulties with getting this job, based on my short experience with specific technologies. I'm convinced, though, that this was the best part of my series of interviews, and it got my foot wedged pretty firmly in the door.

Writing software is fundamentally different from working on an assembly line. You don't solve the same problem twice. You don't write the same code twice. If you are repeating yourself, then you are in some way failing to leverage past experience. You should be moving the duplicated code into a common function, or a common library, or something, so you can stop working on problems that have already been solved, and get the business of making your program do something new and interesting.

Because of this non-repetitive approach, it's important to be able to think about every problem from multiple angles, to try and relate it to past experience and evaluate different kinds of solutions. Big O Notation is one tool for being able to evaluate how you're doing.

Castles of Air

Wednesday, September 14, 2011

A bit about data structures

Thursday, March 12, 2009

Big-O Notation 3: Showing off how you think

Followers

Stuff you should read