Wednesday, March 18, 2009

Recursive sorting 2: QuickSort

I got good feedback from my last post about recursive sorting, some of it by email and in person.  I also heard some serious concerns about my world domination scheme. You can all relax, because my Evil Generals are all away on extended vacation for the foreseeable future. This post probably won't be as amusing as the last one, as I've already explained the basic groundwork. But I'll try to give you the basics so you can do QuickSorting yourself.

Here are the fundamentals of how QuickSort works. In Merge Sort, we split up two decks of cards by size, and then sorted each half. How exactly you "sort each half" is up to you. If the remaining divided up decks are still large, you can just divide them in half again and run two more Merge Sorts. But if the decks are small, you can sort them in another way. Even Selection Sort might be a good choice if you have no more than, let's say 10 cards. OR, you can just keep dividing and dividing until you get down to two or three cards, and then perform a couple of swaps to get the cards in order.

QuickSort is similar, but a little different. We still split up the cards, but in this case we split them around a card value instead of a specific deck size. Here's an example. Start with unsorted card:
  • 8, 4, K, A, J, 5, 9
In QuickSort our plan is to start with the first card (8) and put it in the correct place.  You remember that in Merge Sort we did the recursive step -- the splitting up into smaller problems -- first, and then the action of actually moving cards around was saved until we reached the lowest level (merging the cards).  That's when we actually do something constructive... like telling an evil private "Go kill that guy."  In this one, we do the reverse.  The action step comes first, and then the recursive step second.  After the 8 is in the right place, we'll sort the cards around it, using another call to QuickSort.

When I say we want the 8 in "the right place,"  what I mean is that we want all the cards that are less than 8 to be on the left side of the 8, and we want all the cards that are greater than 8 on the right side.

How do we get there? The 8 starts in the low position, so put your left hand on the 8, then put your right hand on the right side of the list. That's 9. 9 is bigger than 8. No switch is necessary yet, so move your right hand one space to the left, toward the 8. You're now on 5. 5 is smaller than 8, so it's in the wrong position.  Do a swap and you get this:
  • 5, 4, K, A, J, 8, 9
So now the 8 is on the high side. It is correctly positioned with respect to some of the cards: it is between 5 and 9, which is where it should be. But you're not done yet. Now you have your left hand on 5, right hand on 8.  Move your left hand up one space, to the 4.

4 is less than 8. Keep going.

King is more than 8. It's in the wrong place.  Time to swap.
  • 5, 4, 8, A, J, K, 9
Now the 8 is correctly sorted with respect to the cards you already looked at. It is more than 5 and 4, but less than king and 9. Nearly done.

The 8 is now on the low side. With your right hand on the king, move backwards. The jack is in the right place. The ace is smaller than the 8. Swap.
  • 5, 4, A, 8, J, K, 9
...your hands now meet in the middle, and you're done. The 8 is in the precise position where it should be.

Now what? Here's the recursive step. You have the 8 in the right place, but you have three cards on the left that are totally scrambled, and three cards on the right that are also scrambled. How do you unscramble them? You QuickSort them, of course.

First you sort this array:
  • 5, 4, A
Then you sort this array:
  • J, K, 9
Do it in the same way as before. For the first set of three cards, put your left hand on the 5, and move it to the proper place (the third position). For the second set, put your left hand on the jack, and move it to the proper place (second position, between J and K). By the time you've finished, we discover that all the cards are sorted. Quick, wasn't it?

Seriously, get your own deck and try this on 20 cards. It's easy and fun once you get the hang of it. Lay them out in a line. As you work, you'll put a card at a time in its final location. When you do this, push the card up slightly so it will stand out from the others. Then work on the unsorted cards from left to right until you're finished.

Complexity of QuickSort

Starting with the first card, you made one complete pass through the array in order to put it in the right place. So this algorithm is at least O(n). Then you performed this pass through the array multiple times. How many?

In the best case, QuickSort does the same thing as Merge Sort: it splits the array almost exactly in half. (In the above case, our first number, 8, landed smack in the middle of the array, with three numbers on each side. That's a perfect split.) So basically, you're dividing by two each time, and after (log2 n) divisions, you're down to sorting one card -- just like what we did in a binary search. So we simplify this and say we examined "n" cards approximately "log n" times, so as I indicated earlier, this algorithm is O(n*log n).

If you're lucky!

But what happens if you start with an array that's already sorted?
  • A, 4, 5, 8, 9, J, K
Well, the first card is already the lowest. So you'll start from the king and start moving left, but eventually you'll get to the ace, having examined all the cards but not moved any.

Then you'll "split" around the ace... but there are no cards at all on the left, and (n-1) cards on the right. So you'll do almost as many comparisons as before against the 4. Then against the 5. And so on, until you've compared every card to every other card. This takes O(n2) time. Needless to say, that's bad.

So what we see here is that there's a difference between the complexity of a typical execution of an algorithm and the worst case scenario. Knowing that, we can do things to plan for the worst case.

Unfortunately, sooner or later it's highly likely that you will accidentally try to sort an array that is already sorted. So although it is counter-intuitive, we can guard against this by scrambling the array first and then sorting it. Scrambling is easy... all you do is go through the entire array (n operations), pick a random number (picking random numbers takes constant time, so it's O(1)), and swap the current number with the one at the random location.

This takes 1*n moves, so it's O(n) time. Then "scramble + sort" would take O(n log n + n) time But n is of lower complexity than n log n, and at high numbers it will be trivially small in comparison. So the "scramble + sort" routine is simply expressed as O(n log n).

Of course, you could be super unlucky and wind up with a scrambled array that happens to be perfectly sorted.  But as you get a very large number for n, the chances of that happening become extremely small.

Another way to randomize this thing is by picking random pivot point.  Instead of always trying to position the first card, pick a RANDOM card, move it to the leftmost location, and put that one in position.  This way you fix the problem of potentially starting with the smallest card every time.

No comments:

Post a Comment