Showing posts with label math. Show all posts
Showing posts with label math. Show all posts

Thursday, February 11, 2010

Angle math

I've got a new applet posted on my web page, which I wrote for my son Ben. He is seven. It is a demonstration of how angles work. You can drag the points of the triangle around, and it will continuously update the display of numbers showing what angle is formed at each point. It also gives a little readout showing that the three angles will, indeed, always add up to 180 degrees.

I like to do a little graphical application every once in a while just to stay in practice. There are a lot of concepts from trigonometry that have to be applied. Debugging graphics is sometimes a tricky affair, because if you don't do the right thing then you might wind up with nothing but a blank screen, or a line might appear wildly out of place. Finding the angles required remembering what sines and cotangents and such represent. (SOH CAH TOA! That's one that never leaves your memory, but I got sin and asin mixed up for a while.)

I also had to fudge the numbers a little bit. For instance, one angle might be 70.14 and another might be 50.34. The third angle should be 59.52. However, if you reduce those to one significant figure, you find that 70.1+50.3+59.5 = 179.9. So I had to fake the third angle (a3) as displaying 180-a1-a2.

The hardest challenge came after I decided to change "nearly" right angles into true right angles. Notice that if you drag one corner so that it forms an angle between 80 and 100 degrees, it will automatically snap to the correct position so that it is 90 degrees. I had to put some thought into making that work, and here's the solution I wound up with.

  1. Write an expression of the line segment opposite the point being moved.
  2. Find a vector perpendicular to that line.
  3. Project the moved point along that vector to find where it intersects the opposite segment.
  4. Find the actual distance that the point must be from the segment in order to make a 90 degree angle.
  5. Move the point there.

I had originally used only a "Point" class and a "Triangle" class to represent the problem space; I soon realized I needed a "Vector" class as well (one which could be used to offset a point, normalized, reversed, or made perpendicular to another vector). Then I had to make yet another class, "Line," in order to properly calculate where intersections can be found. I borrowed a lot from this page, and found out just how long it had been since I needed to do that math -- I had a totally incorrect idea of how a line formula is expressed.

Basically the key to correcting your math mistakes -- and I made a lot! -- is to create either a printout or a visual representation at every step. For instance: I think I've normalized the vector correctly, better print out the coordinates and make sure the length is really 1. I need to draw an extra line to make sure it really goes through this point. I need to draw an extra point to make sure it really intersects that line. Do these two lines LOOK perpendicular to me? And so on.

If I had remembered the math better then some of this rigor wouldn't have been necessary. But really, caution and constant testing while coding eliminates the need to be a perfect math whiz. Coding is both theoretical and experimental, you see.

Friday, October 16, 2009

Ugly numbers in a heap

As I was saying, a good solution to finding ugly numbers would be to somehow enumerate only numbers which are known to be ugly, removing them from memory after they have been read. I couldn't think of the way to do that. Internet to the rescue.

Here's what you do:
  1. Store the number 1 in a set.
  2. Consider the smallest item, "n" in the set. That is counted as the next ugly number.
  3. Add the numbers n*2, n*3, and n*5 to the set.
  4. Remove n from the set.
  5. Repeat steps 2-4 until you reach the desired ugly count (1500).
Each time you search for the next ugly number, you remove one number from the set but add three more, for a net gain of two. Thus, the set of numbers you are storing on each step i is (2*i-1). That's a manageable memory size. But there's still a snag.

Sets are unordered lists, and we want to always find the smallest number of the set as efficiently as possible. How do we do that? We can't just read every number in the set, or the problem scope quickly becomes O(n2). Can't have that.

Turns out there is an ideal data structure for handling this kind of thing, but it's something I haven't looked at since I was an undergrad. It's called a min-heap.

A heap is a special kind of binary tree, where every level of the tree is filled to maximum density. For instance, in a heap with 8 elements, you know that the top level has one node with exactly two children; each of those two children has two children (that makes 7), and the leftmost node at the third level has one child on the left.

The neat thing about using a heap is that because of its predictable structure, you can represent it using an expandable one-dimensional array. You don't have to mess with pointers or the other headaches of data structures. If you are looking for root.left.right, you just calculate the spot in the array and access element 4.

A min-heap is constructed in such a way that the children of each node are guaranteed to be larger than the parent node. In other words, the smallest value is guaranteed to be at the root, and each child of the root is itself a min-heap with the smallest element of that at the root.

(If I'm going too fast for you, speak up. I don't believe I've ever done a post specifically on binary trees, so maybe I should.)

In any case... you can do two things with a min-heap: add a new element, which is rapidly sorted to the right place; and remove the root element, which then moves the next smallest element into the root and fixes the rest of the heap so that it is still a min-heap.

This is exactly what we need. It keeps the cost of searching for the smallest known ugly number to O(log n), where n is the number of elements currently in the list.

So to recap:
  1. Write your own min-heap class, which is the fun part. ;)
  2. Add the number 1 to the heap.
  3. Inspect the root of the heap, and count that number ("r") as ugly.
  4. Add 2r, 3r, and 5r to the heap.
  5. Delete the root.
  6. Repeat steps 3-5 until you reach the desired ugly count (1500).
Actually it turned out that wasn't quite the answer, because you can get duplicate ugly numbers on the heap. For instance, in the second pass you add 2*3, while in the third pass you add 3*2, which results in multiple 6's on the heap. But that's easy to fix. All we have to do is change step 5 to: "Keep deleting the root until it is no longer equal to the last recorded ugly number."

And there you go. Small storage space, quick execution time, ignores non-ugly numbers, and finishes in 3 seconds easily.

SPOILER:

In case you wondered, the answer is 859,963,392.

Ugly numbers and prime numbers

The second wrong approach I tried taking to find ugly numbers was something called a Sieve of Erathosthanes. This was something that my dad taught me about in high school, and it helped me win a programming contest as a sophomore.

Here's where I should give a quick shout out to my favorite HS teacher, Thom Laeser, who I think might read this blog. Thom was kind of a Renaissance man as a teacher; across my four years in high school, I took one geometry class, two programming classes, and a journalism class from him. He also started the school's first Humanities course, which I did not attend. It wasn't until high school that started really programming (rather than just dabbling in BASIC), and I credit Mr. Laeser with giving me that extra boost.

When I was a sophomore, he gave all his programming students a challenge, which was to be the one who could write a program printing the most consecutive prime numbers within five minutes. This was 1990 and the computer lab probably was limited to 80386 processors or something.

It was clear early on that the contest was going to be between me and a senior. The next day, we both proudly came in with programs that printed all primes from 2 to MAXINT in well under 5 minutes. This was in Pascal, I think integers were 16 bit, so the highest number would be 65,536. Mr. Laeser shrugged and said that was very impressive, but we would have to keep going if we wanted to win.

So. Along the way, I had to learn to write my own large integer output, and also save space by representing numbers with bit manipulation instead of wasting entire "int" values on booleans. More importantly, I learned about the Sieve of Erathosthanes, which I mentioned before.

The sieve is a method for quickly weeding out prime numbers from a finite list of numbers. Here's how it works (simplified version): start with an array of booleans from 1 to, say, a million. The first true prime number is two, so cross out all multiples of 2 that are not 2; so, 4, 6, 8, 10, etc. Then we move up to three, crossing out 6 (again), 9, 12, etc. Next we encounter 4, but 4 is already crossed out, so we ignore it and move up to 5, eliminating 10, 15, 20, etc.

It turns out that you only have to filter on numbers up to the square root of the array size, which in this case is 1000 (sqrt(1000000)). After that, you know there are no multiples of 1001 because you would have had to multiply 1001 by something smaller, and hence already visited, to reach a million.

Long story short, I won the contest, but it was a grueling arms race for several weeks as we kept improving our programs and then "stealing" new ideas by observing how the other's program worked.

Sieve of Ugly

So that was a roundabout personal story, but I haven't told it here before so I thought I might as well get it out. And the point is that my second pass at solving the ugly number problem was to use a sieve.

Sort of a simple modification, really. I didn't just use a boolean array, because there are now three kinds of numbers: 1: prime; 2: ugly; 3: neither prime nor ugly. So I made it an integer array, where each integer stores one of three flags. All numbers start prime; then multiples of 2, 3, and 5 get marked as ugly. Then run another pass on all numbers above 5, marking their multiples as non-ugly.

Using integers was super-overkill, as an int is 32 bits, and I only needed 2 bits to represent my flags: 00=prime, 01=ugly, 10=non-ugly. What I found was that I couldn't sieve up to the 1500'th ugly number, because the array was so huge that I ran out of memory.

So first I changed it to an array of bytes (8 bits). Then, still not satisfied, I improved performance by writing some bit-manipulation routines to simulate an array of two-bit numbers using four numbers per byte.

It worked; I found the same number much faster than before, but still with a noticeable delay before the sieve finished running. 3 seconds was still way out of reach.

What next?

While I was coding the sieve, it occurred to me that it was a terrible waste of space to store ALL the numbers, both ugly and non-ugly, in an array until the entire program was finished. It seemed like there ought to be some kind of quicker way to figure out, given the current ugly number, what the next one in the sequence should be. At worst, I could maintain an array of only 1500 numbers to store only ugly numbers (instead of all numbers up to a trillion).

I wracked my brains for this method; I wrote down the factorization of the first 10 numbers or so, looking for a pattern, but couldn't find one. I'm pretty sure there isn't one, but I was on the right track at this point. I know because that's when I "cheated" by looking to see what other people had done.

And maybe they're smarter than me, or maybe they spent more time on it, but there is still a certain grim accomplishment in not having to reinvent the wheel.

There's a well known "hard" problem in computer science called the traveling salesman problem, which goes like this: You have a bunch of cities, separated by known distances. You have to visit all of them while spending as little time as possible in transit.

To solve this problem by brute force becomes impossible very quickly as you add more cities. So the best known approaches use short cuts; they use sneaky tricks that don't necessarily produce the "best" answer, but are guaranteed to produce a pretty good answer most of the time.

Anyway, I heard that some computer science genius was once asked how he would solve the traveling salesman problem, and he said: "I would put it on the back of a cereal box. 'Find the most efficient path through these points. Winner gets $100, mail your answer to the following address'..."

Well, that's one kind of divide and conquer solution right there. Hooray for the internet, though... now we don't even need cereal boxes.