Thursday, March 5, 2009

A retrospective on the terrifying millenium bug

It's December 31, 1999.  I've got some neighbors over at my house watching movies, and we're all waiting for the rollover.  One of them asks me what time it is.  I check my watch.  "11:58," I say.  "I guess it's time for the world to end.  Want to go outside and watch the fireworks?"  Everybody agrees.

We go outside, and there are fireworks, all right.  My neighborhood is full of people who love fireworks, and they never fail to put on a good show twice a year, on New Year's Eve and Independent Day.

Glancing up, I wryly remark: "The street lights are still on.  Maybe it has to be Pacific Time."

It's the end of the world as we know it

For those of you who are too young to remember or who didn't pay attention to the news from about 1997 onward, January 1, 2000 was the day that civilization as we know it was supposed to collapse.  You see, many computer programs at that time stored information about dates as two digits numbers, which would confuse many computer systems at rollover.  A year of "00" could just as easily mean "1900" as "2000."

Therefore, it was theoretically possible that numerous programs would break in a spectacular fashion, causing bills due in 2000 to be flagged as 100 years overdue.  With interest.  People spent several of the preceding years gloomily talking about how all the computers of the world were completely interconnected, and therefore any systems that did not address this problem might somehow corrupt all the other systems they were connected to.  The Simpsons even devoted a hilarious segment of their annual Halloween special to Y2K in 1999.

Here's what one high profile Y2K alarmist, Gary North, had to say about Y2K (emphasis added):

Who knows where it will start? Only one thing seems certain: it WILL start. And when it does, every market institution and every government will suffer enormous setbacks.  ...

Banking is the obvious domino. Here's another: shipping. What happens to cities if gasoline is unavailable to truckers? If the computers that control train schedules break down? If rail freight cars cannot be located by defective computers? Think about your supermarket's shelves.

What happens to production when "just in time production" becomes "bottleneck production"?

Here's another: farming. Modern commercial farming is tied to hybrid seeds. The plants produced by hybrid seeds produce seeds that will not produce healthy plants if planted. Every year, almost every large farm on earth must re-order another batch of hybrid seeds. If, for any reason, the seed companies fail, or the banks fail, farmers will not be able to plant anything. This will lead to a famine. Let's not hedge our words: FAMINE. There is no way today to get enough non-hybrid seeds into production in order to avoid this problem. If this is one of the dominoes, the result will be widespread starvation.

Hey, everyone!  Remember that one time when the banks failed, governments collapsed, truckers were immobilized, people everywhere were starving, and civilization as we know it ended?  Good times, good times.

In the immortal words of John Cleese, "I got better..."

Disclosure: Gary North was and is an extremely weird dude.  He's a dominionist -- a very special strain of religious zealot who believes that Democracy should be eliminated in favor of a Biblical variety of sharia law.  Complete with public stonings.  North has made a career out of predicting the end of the world at regular intervals, and in the late 90's he just happened to get more traction than usual.

Even so, panic over Y2K was widespread.  I was teaching programming classes at that time, and the course requirements forced me to give students an essay and presentation assignment.  Y2K was one of the possible topics I assigned.  Three or four students chose to write about it, and every single one of their papers unambiguously predicted doom.  Sometime in January 2000, I wrote a mass email to all the former students whose addresses I had, joking "Everyone who wrote about Y2K last year will now retroactively fail the class."

Funnily enough, I personally fixed a Y2K bug.  I was working at the time for Bike.com, one of those multitudes of "dot com" companies that was ultimately doomed to fail, but hadn't gotten the memo yet.  I wasn't a full-time employee; I worked for Compuware, a development shop that farmed me and many other mercenaries out to various little companies for a limited time.  As such, I had a grand view of many such companies during the infamous tech industry crash, but I didn't go down with any of those ships -- at least not until Compuware ran out of work for me and then folded their Austin branch entirely.

Anyway, January 1 fell on a Saturday, so I didn't get back to work until the third.  "We have a problem," my supervisor told me.  "Some of the dates in the bike sales page say that they went online in the year 19100."

I hunted through the code for the offending page, and quickly figured out what was going on.  The code was written in Perl.  Dates in that language represent the year as an integer counting up from 1900.  So the year "1998" is represented as "98," and so on.  Some bright developer who came before me had decided that the proper way to write the full year was "19 . $year".  The period is a concatenate operator, so this bit of code translates to "write the number 19 and then write the year."  Obviously, when the year became "100" the long text representation became 19 followed by 100, or 19100.

That was easy to fix.  I replaced the period with two zeroes and a plus sign, making it display "1900 + $year", because 1900 + 100 = 2000.  I searched all the code for variations of the string "19 ." and performed the same trick everywhere.  The whole operation took less than 20 minutes.  Then I went to my supervisor and announced "Your Y2K problems are all fixed.  That will be ten million dollars.  In tens and twenties, please."

...And I feel fine

If there's a lesson to be drawn here, my take is that software systems are way more resilient than we give them credit for.  People tend to have a certain degree of unreasoning superstition about computer programs.  They are these mysterious and terrifying black boxes that are supposed to serve us, but occasionally fail in inscrutable ways for their own malicious reasons.

Minor tangent: Many people I talk to seem genuinely fearful that the scenario in Terminator or The Matrix really will play out someday; we'll wake up and discover that SkyNet has become intelligent, and also mad as hell at the human race, not to mention being a super-genius planner that can outsmart everyone on Earth.  I always have to point out to these people that it took human beings two billion years of evolution to become as smart as they are.  Even with the benefit of all that complex software in their brains, it takes every human child some eighteen years to explore the world, form up connections and memories, and become a fully functioning adult.

I kind of think we'll notice robots becoming self-aware pretty far in advance, and it will probably be on purpose.  Plus, there's no reason to assume that the intelligent robots will naturally turn on their creators, just as we don't assume that every child will eventually desire to kill his parents.  There are some outliers who become sociopaths and go on a mass murder spree, but most kids do actually grow up to be non-maniacs.

For the time being, all our efforts at what we laughably call "Artificial Intelligence" still result in programs that are fairly dumb, and extremely narrow in their scope.  Computers actually do what we tell them to do.  We programmers frequently tell computers to do the wrong things, in the wrong ways, or fail to be sufficiently precise.  But robust systems like banking are generally written in such a way as to be extremely modular, so that failures are local problems that don't destroy all the data, or accidentally transfer millions of dollars into your bank account.  (Sorry, fans of Office Space... it's really not that easy to rip off a bank.  They wouldn't stay in business very long if it were.)

Certainly there are cases where a misapplication of computers can do something so dangerous that it jeopardizes national security.  Here's an interesting recent story of government secrets being sent to Iran via BitTorrent.  But as you might expect, the computer itself is not the villain: like deliberately written viruses, stealing files in this manner is a deliberately designed spy technique that is executed to take advantage of existing software.  It takes an intelligent agent to make computers do something harmful in a creative way.

So I remain highly skeptical of a catastrophic bug that will magically, accidentally, without deliberate effort, somehow cause a simultaneous collapse of millions of computer system worldwide.  In the Simpsons episode I mentioned, all the computers in the world are fixed, but because Homer fails to fix his own particular system, that one failure somehow "infects" the rest of the internet.  Comedy!

But the truth is a lot more mundane: individual programs screw up all the time.  The better ones are written in a way to isolate and localize the errors.  The worse ones either fix their problems or go obsolete.

Many companies were destroyed in the aftermath of 2000, but it wasn't the fault of the milennium bug; it was just another example of human error.  In this case, it was poor business planning.  Go to bike.com now if you want.  There's no impressive web site with a massive database of bicycle sales.  Instead, it's a small shell of a site that hosts a few third-party ads.  This isn't the company I worked for; those guys probably went bankrupt long ago because very few people actually want to buy bikes on the internet.  The domain has been taken over by a company whose business strategy is barely more ambitious than cybersquatting web sites.

So worry about failures due to ordinary hubris, sure.  But don't worry about a massive robot uprising, or a simultaneous failure of all the world's software.  And if you want to rip off investors, there's no need to count on obscure computer glitches, just do it the old-fashioned way: lobby to roll back industry regulations, invest in super-risky schemes, and then push your losses on the taxpayers on the grounds that your company is too big to fail.

6 comments:

  1. You forgot to mention that in 2000, Yet Another Perl Conference was called YAPC 19100. I was glad Perl hackers could poke fun at themselves.

    software systems are way more resilient than we give them credit for.

    I think what's happened is that over time, programs have evolved from being monolithic self-contained universes, to being players in an ecology.

    By this I mean that way back in the 1950s and 1960s, when CPU time was expensive, programmer time cheap, and companies had computers to perform certain specific tasks, programs tended to be self-contained units that acted in a rigid and fragile way: they'd read punched cards with account information, and print invoices, or something. And if the account number didn't start in column 7, well, the program couldn't be bothered to check for operator mistake. Basically, code tended to make a lot of assumptions about what was going on. Violate one of these assumptions, and the mistake would either propagate throughout the code, or the program would crash.

    These days, however, programs interact not only with untrained humans, but with other programs (e.g., when your browser runs a Flash app, or your editor invokes an external spelling checker, or when your PHP script gets data from MySQL, or even when your Minesweeper game uses an external graphics library). This means that programmers can no longer assume that their input is "clean", or that their program can monopolize the machine.

    This sea change can be seen, in part, by the number of security vulnerabilities involving buffer overruns that have cropped up over the last decade or so.

    Basically, these days a program is no longer The Only Thing, it's just a player in a wider "ecology". In these circumstances, I think programmers are driven to write more defensive code, and users are more likely to spot problems early on and work around them.

    ReplyDelete
  2. Russell, wait 'til our robotic masters take over. We'll see who's so smug then!

    Hail our cybernetic overlords!

    ReplyDelete
  3. BTW: http://www.theonion.com/content/video/in_the_know_are_we_giving_the

    ReplyDelete
  4. Arensb, great insights. I really appreciate it. As a high school student and undergrad some 15-20 years ago, I learned all about writing programs. When I went through grad school, it was assumed that you already know all about writing programs, and many of the classes emphasized the interaction between programs -- learning to write software systems instead of just software.

    I'm not really sure how much of that was just more advanced course material, and how much of it was actually a sign of the changing times.

    ReplyDelete
  5. So Gary's prediction was in the late 90's? _After_ business/government had noticed there was an issue and started to do something? Although I think he's an idiot for many reasons, I had been holding out hope that he made his prediction in 1993, when the management level of the computer world was starting to make noise about the problem.

    Had that been the case, I would have had one less thing to call him a retard about.

    I make the analogy of someone walking towards a cliff. Another points out that the person is walking towards the cliff, so the walker turns and walks somewhere else. That doesn't mean that the cliff wasn't a problem. It means the problem was seen, understood and addressed, which is exactly what happened with Y2K. But all of the major fixing was in motion by 1997.

    // WI worked as a trainer for a consulting company. All of our clients had proper patches/upgrades etc performed by July of 1999. Some of the clients wanted our people at their sites "just in case". The owner said "Sure. $400/hour because it's ordinarily a holiday, plus overtime. And minimum billing time is 4 hours." For some reason, none of our clients took him up on his offer. We had no problems.

    ReplyDelete
  6. The big problem was the banks. It's a complex task to upgrade databases from 2-digit years to 4-digit years when the databases are live, the code is COBOL, and you have to modify the code in such a way that it can cope with either type of record. Lots of ways to get it wrong; few ways to get it right. Not rocket science, but not something you approach lightly and say "we can live without it for a couple of days if things go wrong."

    ReplyDelete