Wednesday, October 9, 2013

Affordable Care Act Blues 2

Following up on my post from yesterday, a friend linked me to an article on Salon that attempted to diagnose the problems with the healthcare.gov website. The article is full of bad logic and half-baked assumptions that I couldn't resist correcting.

First of all, the article refers to a discussion on Reddit that was just ridiculous.

“The Redditors picking apart the client code have found some genuine issues with it, but healthcare.gov’s biggest problems are most likely not in the front-end code of the site’s Web pages, but in the back-end, server-side code that handles—or doesn’t handle—the registration process, which no one can see. Consequently, I would be skeptical of any outside claim to have identified the problem with the site.”

This is WAY too charitable. I’m not “skeptical.” I think the Redditors are dumb as rocks for wasting any time on this at all.

Server side (back end) code handles pretty much everything that is important about a web application's internal logic. The client side (front end) stuff -- which you can see printed out if you right-click on any page in your browser and select "View Source" -- handles cosmetic stuff. How the website looks, what kind of warning messages you get if you enter bad data, etc. It's possible for a site to break because of bad front end code, but that were true then it would almost certain be broken for everybody, not up and down sporadically.

I passed along the article to a friend who is the architect of our software at my job. His explanation is more thorough than mine, so I'm reprinting it with his permission.

Here’s how I would have written the same article but with my speculation….

I think the biggest and most important challenge is the overall architecture. From an architectural standpoint you need to ensure that your system is
  1. Secure
  2. Can scale appropriately
  3. Handles the scenario where it is overloaded
Secondly, for the end user experience, from a UI perspective you need to be
  1. User Friendly
  2. Ensure proper feedback
  3. Have as much client side validation as possible
Once you go live you are in a situation where you have to deal with all types of scenarios quickly. One issue that I see referenced a lot is the fact that the security question dropdowns aren’t populating. Maybe they load tested this but didn’t verify the contents of the actual html rendered. If you rely on older back ends then that needs to be part of your load testing and you need to reduce talking to those systems the least amount possible.

As far as server load there are 3 types of congestion you need to deal with. Memory, CPU and information I/O (Network and Drive). The dropdown thing might be related to either one of those three.

This, of course, is informed speculation about the nature of the problem itself, drawing on real experience about how websites are designed, and how to go about debugging the problem. The Salon author, David Auerbach, doesn't do this sort of thing. Instead, he casts about wildly to find a place to assign blame. He claims that he can identify it as an Oracle problem based on a single error message:

“Error from: https%3A//www.healthcare.gov/oberr.cgi%3Fstatus%253D500%2520errmsg%253DErrEngineDown%23signUpStepOne.” 
To translate, that’s an Oracle database complaining that it can’t do a signup because its “engine” server is down.

What? It is? How did he know that? I looked up “ErrEngineDown” to see if it might be a standard Oracle message. It is not. So my reading is that it’s simply the name that the developers themselves chose to assign to this particular error. There is literally nothing you can determine from this one status result, as far as identifying what kind of database they used, or why the database failed.

After that, Auerbach goes on to state that

That is, the front-end static website and the back-end servers (and possibly some dynamic components of the Web pages) were developed by two different contractors. Coordination between them appears to have been nonexistent, or else front-end architect Development Seed never would have given this interview to the Atlantic a few months back, in which they embrace open-source and envision a new world of government agencies sharing code with one another.

It's true, apparently there are at least two different developers who've had their hands on the system: Development Seed and CGI Federal. This is not a legitimate criticism of the process. The federal government is big. The ACA is big. It is routine and normal to have multiple companies working with one set of data. After all, each individual state seemingly has their own website which has to connect to the ACA. In that case, you typically have a web service host on the back end, which processes results, and the results come from many different clients -- i.e., the state's web site, which was probably developed at least partially by someone in the state.

And that's all we know. There is no evidence I can find in any of those links, that "coordination between them appears to have been nonexistent." Auerbach simply made that up. He also makes up a cute little fictional dialogue that has no grounding in reality whatsoever. That conclusion also does not follow from the fact that they use open source code and are enthusiastic about promoting open source principles, unless I’m missing some other piece of information cited. It may be the case, but it’s not demonstrated at all. Open source is just a model of developing something with transparency. It doesn't say anything at all about the process which the two developers followed.

To reiterate: We're talking about a website that is not working, for some people, some of the time. A friend posted just today that he successfully created his account and is done with the process. This is not a site that is fundamentally broken or flawed; it is a server that is sometimes overloaded by a very high volume of traffic. That's it. That's the whole problem.

4 comments:

  1. Well, yeah, but "web site groans under the weight of nine million users" doesn't sell as many newspapers as a good yarn about corruption and incompetence.

    ReplyDelete
  2. The info tracks. Sometimes the website works just perfectly for me, sometimes I cannot even log in. It depends on the time of day that I am on it. Now I have run into a deeper problem where my profile and the credit agency responsible for proofing peoples identities have proofed my identity and plugged into the website that I am who I say I am. Yet, when I get to the end of the application it does not connect to the ID proofing that has been done, and says that I will get info as to plans to choose from and tax credits and stuff once I have proven I am who I say I am (of course the website will not allow me to prove it is me because according to the website I have already proved to it that I am me...confused yet?). This issue has baffled the helpful and nice folks at the call centers and my issue is currently being "escalated" to higher levels. My guess is that since I started the process at about 3am on October 1st (I was really really excited!) and the server had been giving me a lot of errors while I tried to verify my identity I have to think my trying to and unsuccessfully at first, then successfully verifying my ID at the time when the server was most vulnerable must have caused some sort of communication problem for my specific account between my profile and my application. I saw this type of thing all the time at CFNC when I worked there. The website had a certain error rate due to gremlins. If healthcare.gov has a 1 percent error rate due to "gremlins" like in my case...not just server overload, that is still 90,000 people having issues (assuming 9 million users which is what I have heard)...and that is enough people to cause an unfounded and ruckus in the media.

    ReplyDelete
  3. "This is not a site that is fundamentally broken or flawed; it is a server that is sometimes overloaded by a very high volume of traffic. That's it. That's the whole problem."

    This exactly. I recently watched a vlogbrothers video of John Green signing up using the government exchange website vs the old way using a private insurance website. Even with the errors and delays, he could still sign up for health insurance in about half the time it took with the private insurance website. Here's the link for anyone interested:

    http://www.youtube.com/watch?v=ql9RVy6FWkg

    ReplyDelete
  4. http://ti.me/1kckGkc provides some more context to the whole set-up. Unfortunately, Time has now hidden the complete article behind a paywall :/

    ReplyDelete