Tuesday, August 2, 2016

Tech talk about Pokemon Go

Quick background for people who don't already know: Pokemon Go is a mobile game that launched last month on Android, and more recently on iPhones. It is billed as an "augmented reality game," in that the gameplay incorporates a requirement to walk around in the real world and visit physical locations, and some virtual objects are treated within the game as if they physically exist at those locations. It was immediately popular, and has been plagued with ongoing problems with server response and performance. In this post I will discuss these issues on a level that laymen should be able to understand.

I've spent the last eight years of my career since grad school working on a variety of heavily trafficked public facing web applications. My most recent work has been on Blizzard's account management site, which receives tens of millions of hits each day, and now I work on the Hearthstone website. (Standard disclaimer: I am not a representative of Blizzard Entertainment, my writing is my own work, this post has not been read or approved by anyone. Also, I do not work for Niantic or know anyone who does, so take this post as informed speculation but not fact.)

Anything that accesses online data which unexpectedly becomes very very popular, runs badly for a while until the owners figure out what to do about it. This is the rule, not the exception. You may remember I wrote something similar about the rocky launch of the Affordable Care Act website -- it wasn't a conspiracy or massive incompetence, it was just normal growing pains for a website that millions of people tried to access. It is to be expected. Popular things to do about too much web traffic include:
  1. Throw more hardware at it. This is an expensive solution, but if your web traffic translates directly to revenue then it's a great problem to have, so you can always just do that.
  2. Wait it out. If you experience a big spike in web traffic because you did something momentarily interesting, but you expect your traffic to fall off to more reasonable levels quickly, then you might not want to buy expensive hardware. You could also just rent some cloud servers for a little while. Depending on your situation, maybe your popularity won't last, so you don't want to be stuck with expensive hardware you don't need.
  3. Transmit less data.
#3 is the kind of problem computer science was designed to solve. You want to process a lot of information, but you want to do it in as efficient a manner as possible. Transmitting data over a network is an extremely expensive thing to do, so ideally you want to pass as little information back and forth as possible. You can do this in two ways. You can process a lot of information on the server, but generate a very small and efficient chunk of data consisting of only the things the client absolutely needs to know. Or you can send the client a very small chunk of raw data, send it to the client, and make the client do a whole lot of the heavy lifting by figuring out what to do with the information themselves.

Pokemon Go consists of two separate programs. There is an app that you download from Google Play or the iPhone Store onto your phone. It's 165 megabytes, and you run it only when you need it. There is also a program running on Niantic's server 24/7. It has a ton of data about your account, about the location of every gym and pokestop in the world, about the individual stats of each of the up to 250 Pokemon that you and every other players own, and about the temporary world locations of all the wild Pokemon that pop up anywhere, any time.

That's seriously a lot of data -- at least on the order of terabytes, I expect, if not more. I'd expect. It's not cheap to buy enough disk space to store all that, but those disks are a one time cost. Where it REALLY gets tricky is when Niantic tries to send players the information that they need to know. Obviously if they had to send you ALL their data, it would take forever and it would cost both you and them a fortune (assuming you pay to download stuff by the gig). A more cost-effective solution is to send you the bare minimum information that you absolutely need to know. But how much information is that?

You may have noticed that the little read-out in the lower right corner of your game has stopped displaying information since the game launched. It used to show shadowed images of nearby Pokemon, along with one, two, or three footsteps indicating how close you are. But for the last couple of weeks, you got less detailed information, and it always showed three footsteps and no less. With the last patch, they removed the footsteps display completely. A lot of people are angry about that. So what happened?

When Pokemon Go launched on July 6, they were victims of their own success. The hook for the game was so interesting, and word of mouth was so positive, that millions of people wanted to download and play it. But that was more communication than the server could handle, and for the first few days players found themselves frequently getting disconnected or unable to create an account, sometimes for hours at a time. Niantic eventually worked around this problem by drastically cutting down on the amount of data being transmitted, and one of the first things they cut was the usefulness of the footsteps panel.

The game designers want you to be able to find new Pokemon, but they don't want it to be too easy. They could just put a Pokemon on your map, "Hey, it's right here!" But they actually want it to be difficult and take time to find Pokemon. In a purely cynical sense, let's say that the more of your time they can make you spend, the longer they have to convince you to spend money on micro-transactions. A little more charitably, games need some kind of challenge to be fun, and just walking to the exact spot they tell you might be less interesting.

Just telling you where the Pokemon is would be too easy, but it would also be a cheap solution. The server spawns a Pokemon in a specific location. It has a numerical ID, a species (pidgey, rattata, etc.) and a location expressed as precise latitude and longitude. That's four numbers. If you're within, let's say, 100 feet of the Pokemon, then it sends those four numbers to your phone, the phone does the work of simulating you seeing it on the map, and can keep the information in local memory. No further server communication about that will be required for a long time.

But they can't send you straightforward information about exactly where to find the Pokemon, because that information would detract from the game play. Now, you might think "No problem, they can send the location to your phone, but the app just won't show you exactly where it is on the map." Which is a good idea that would also save a lot of bandwidth, but it has another problem: People cheat at games. In particular, it's possible for a different app to read and decode information coming into your phone that was supposed to be caught by Pokemon Go. So now a devious programmer could write their own app, that uses that information to tell you the exact location of every Pokemon, and if you download that app you will have an unfair advantage over other players who don't have it.

So the game server didn't send you that information. Instead, they constantly monitored your position and updated the footsteps, in a sort of crude "warmer" and "colder" hint system, until you get close and the exact location of the Pokemon appears. Now again, I don't work at Niantic so this may be pure speculation, but it seems like this system would require a lot more bandwidth. In order for the warmer and colder game to work, it has to constantly check your position and send you updates about whether you've moved nearer or farther to the target. And since it hasn't sent you the real position of the Pokemon, it just has to keep sending you 1, 2, or 3 steps over and over again for each Pokemon that you might be near. The app can't just decide how many steps to show, because that information is hidden, so the server has to feed the app with constant updates about your progress.

So that's wasteful, and because Pokemon Go's servers were so badly strained that people couldn't play at all, it had to go. The information was useful for players trying to track Pokemon, but you can still stumble on a Pokemon through luck, and most of the action in the game happens at Pokestops and Gyms -- landmarks with directly supplied coordinates, so they don't need to update as often unless you're standing right next to them.

Unfortunately, Niantic is a small company, with a few dozen employees and probably no dedicated customer service or full time PR manager. So they took a lot of flack for removing this feature, and they didn't explain the reason why for a long time. A few hours ago they put out this statement:

We have removed the ‘3-step’ display in order to improve upon the underlying design. The original feature, although enjoyed by many, was also confusing and did not meet our underlying product goals. We will keep you posted as we strive to improve this feature.

I can believe that the 3-step display was confusing and they can do better; from a gameplay perspective it was definitely tricky to understand and use effectively. But even though they didn't say so directly, the fact that they removed the feature so quickly almost certainly means it was contributing to the server problems. If and when they add a better feature, it will almost certainly be one that doesn't put a strain on their bandwidth.

They might also just add more hardware, of course. Reports say that Pokemon Go is super profitable, and I'm sure they can afford it. But growth like that doesn't just happen over night, so developers need both a quick fix and a long term solution.

1 comment:

  1. Thanks! I didn't think of it as a server data bandwidth and processing problem. I really thought it was more like: 'this doesn't work, causes problems, and it is so difficult to fix quickly that we should remove it to get rid of the headache'. In a sense, the second description is not inconsistent with your description. I was thinking that 'doesn't work' was more like the programming was simply not functioning accurately and making it function accurately was beyond their capabilities to fix quickly. That probably comes from my years of programming which always felt like me stumbling around in the dark until I finally hit on a solution, often an inelegant one.