Monday, April 13, 2009

Random numbers aren't random

In a previous post, I mentioned that you can add the appearance of intelligence to a game playing script by throwing in a random component. While the computer can always win tic-tac-toe by playing the same sequence of moves, it looks much less boring if the computer picks different winning moves in each game.

Following up on this, I got a special request to explain how random number generators work. I happen to know that the person who asked for this information is sort of a logical determinist, so she'll be delighted to hear that random numbers are not random at all, but generated by a very specific script.

When we think about random numbers, what we generally want is an unpredictable number in a range, with 0 or 1 at the low end, and some number "m" at the high end. How big m is depends on the application. For instance, to simulate a die roll, we want a random number from 1 to 6. To simulate drawing a card, we need a number from 1 to 52. For a roulette wheel, 1 to 38. For a game with two dice, like monopoly, we need to generate two random numbers and add them together. The rules for a role-playing game such as Dungeons & Dragons will often instruct you to roll 1d20 (roll a 20 sided die, i.e., generate a random from 1-20) or 3d4 (roll three four sided dice) to determine a monster's health, or whether they hit you on a particular round. Games like World of Warcraft have borrowed heavily from tabletop gaming rules in order to come up with event odds in the range that make sense for the game balance.

It's hard to generate a truly random number, since computer programs are fundamentally deterministic in the way they follow algorithms. So they do something slightly easier. Once we have one random number (or pseudo-random), we generate a sequence of randoms after that, where each one is based on the number before it.

There are many functions which can generate this sequence; one common example is this:
  • Xn+1 = (a*Xn+b) mod m
In this case, a and b are constants that you can pick, often large prime numbers. "m" is a maximum value for the resulting number.

Let's try a few examples. I'll pick arbitrary values a=107, b = 23, and m=101.

  • 1 (This comes before the first number. It's called the "seed," as it helps to determine which numbers will come up in the pattern.)
  • 107*1+23 = 130; 130 % 101 = 29
  • 107*29+23 = 3126; 3126 % 101 = 96
  • 107*96+23 = 10295; 10295 % 101 = 94
  • 107*94+23 = 10081; 10081 % 101 = 82
  • 107*82+23 = 8797; 8797 % 101 = 10
  • 107*10+23 = 1093; 1093 % 101 = 83
As you can see, this function produces a string of seemingly unconnected numbers from 0 to 100. However, they aren't unpredictable. If you know the formula, and the current number is 29, then you know the next number is always going to be 96.

Because of this, you wouldn't want to use 101 as your "m" value, if you were actually looking for a random number in the range of 0 to 100. You could always predict the next number in the sequence. If you're a company that writes video poker machines, and your random number generator is easy to reverse engineer in this way, you'd be in very big trouble.

So instead, you'd make all your constants, m, a, and b, much larger. Then you'd take the resulting random numbers and mod them by a smaller number to get program data, but you'd still retain the larger number to generate the sequence. That way, if people see the number "3", they have no way of knowing that the real number in your random sequence was generated by 5,495,110 mod 6 plus one, and therefore they can't deduce what the next number is going to be, even if they know the algorithm.

So the random sequence is masked from the program's end user by not telling them what the formula is. Nevertheless, a random number generator is still predictable based on the seed. If you start with a fixed seed number of 1, and you get the sequence of die rolls 4, 2, 6, 1, 3, then you will always get the same sequence after you shut down the program and restart it.

If an enterprising gambler were to discover that his video poker machine was programmed so foolishly, he could take advantage of it in the following way: First, unplug the machine and plug it back in to reset it. Then, bet small amounts of money while writing down the card sequences that you get. Then, reset the machine again, and bet large amounts of money against the cards that you know will turn up.

So predictability is the bane of random number generators. Clearly, it's important not only to have a disguised algorithm, but an unpredictable seed set at the beginning of your program.

How can you set the seed? It doesn't really matter, as long as it's not based on a known quantity. It needs to be a truly random number, but again, there is no such thing as a truly random number to a computer.

One approach is to use the internal clock, including milliseconds. A human can't really control the precise millisecond at which the program starts, so there's no way to predict what the seed will be.

There are wackier ways to handle it. For instance, an algorithm called "lavarand" operated by pointing a camera at a lava lamp, taking a digitized picture, and creating a hash from the bits. This is chaotic enough to provide something that appears to be truly random.

So to recap, in order to generate random numbers, you need
  1. An algorithm that is deterministic but unpredictable
  2. A random seed that is preferably discovered through some (also unpredictable) aspect of the external world.
There are, of course, also advantages to using predictable seeds. If you always use the same seed, then you can test the same set of data repeatedly, which you sometimes must do if you don't want to tear your hair out trying to nail down a problem with the algorithm that actually does something with the numbers. But in a real world environment, you need some kind of outside information in order to randomize the results and stymie the users.

5 comments:

  1. "If you start with a fixed seed number of 1, and you get the sequence of die rolls 4, 2, 6, 1, 3, then you will always get the same sequence after you shut down the program and restart it."

    I found this out when programming a "Guess the number" game on QBASIC. Winning all the time was frustrating.

    ReplyDelete
  2. Am I missing something? In the second step of your example, you have:
    107*1+23 = 130You then take the modulus of 126 % 101. Shouldn't that be 130 % 101?

    By the way, this entry is very informative.

    ReplyDelete
  3. Good catch, James. The error is because I twiddled around with the constants several times before I posted, and I apparently made all changes except for the first step. IT's all fixed now.

    ReplyDelete
  4. Daneel Olivaw: If I recall, you can place the line "RANDOMIZE TIMER" at the beginning of your QBasic program to seed the program with the internal clock's value.

    As an extension of your mentioning of the advantages of seeds generating predictable sequences is that they can be used to create very tiny representations of things like randomly generated level layouts for games. One of the best examples that comes to mind was the level editor for Worms 2. You could enter a string of digits and numbers into the generate level box, and it would always create the same level.

    There are of course drawbacks to this because the level generator only has as many initial states as there are bits in the pseudo-random number generator's (PRNG) seed.

    ReplyDelete
  5. Hmm technically aren't even real world random number generators (like dice) not really random as the out come is predetermined when you roll based on force, torque, potential energy, angle, elasticity of dice and table etc? So the only real difference is that physical dice use infinitely more outside stimuli in their 'algorithms' to simulate randomness?

    ReplyDelete