About Bookmarks Contact Library Map Photos Search
September
18
2005
9:28 am
Tags:
Post Meta :

It is 9:07 AM MST and Wikipedia is down! There are two interesting things about this outage. First, I’m amazed at how important wikipedia has become to my day-to-day activities. Second, all sysadmins can learn a lesson in good service from Wikipedia; they have a robust network design that allows users to understand the status of the system even during outages.

I used to use google all the time. But google is time-consuming and sometimes yields no good answer. In the past long while I have used wikipedia more and more as my “first search.” Wikipedia typically gives me a good general answer. I go to google second in the hope of finding specific answers. Google isn’t good enough on its own because sometimes I want an answer not the opportunity to search for and evaluate answers. Wikipedia is like a librarian. Google is like asking an oracle. Librarians are professional information agents. Oracles are… freaky but cool. You can have a certain level of trust in the former but must constantly question the latter. There is a place for both but my current thinking is wikipedia first and google second.

Now, I have already mentioned that wikipedia is down at the moment. How do it know that? Well, when I go to wikipedia’s site, it tells me so. That in itself is quite unusual. Most sites, when they are down, result in a message of “could not connect to server” from your browser. In this case, it is a wikipedia system telling me that wikipedia is down? How can the system be down and yet be “up enough” to tell me how down it is?

Wikipedia has servers all over the world. When you connect to wikipedia, you are not talking directly to a webserver but to a proxy: a server that talks to wikipedia’s servers on your behalf. It works like this. You ask for a page on “wireless networks.” Your browser connects to the wikipedia proxy and asks for the page. The proxy picks one of many wikipedia servers and requests the page on “wireless networks.” If everything goes well, the wikipedia server gives the page to the proxy and the proxy gives the page to your browser.

If things go wrong, the proxy can tell you about it. That is the case today. Something has gone wrong. Since the proxy is still working, it can tell me why it cannot give me wikipedia pages. Not only that it tells me where I can check the status of the wikipedia system and where I can chat with others about the problem.

There is a lesson here for all sysadmins. It is similar to the “defense in depth” strategy in security. Reliability in layers is the general lesson. In this case there is not simply one layer (the web server). There are several. There is the proxy, the web server(s), offsite status pages, and out-of-band communication channels for communicating problems promptly.

Comments
Participate! Leave your comment.