Wikidot Outage

04 Mar 2009 22:22

Today Wikidot encountered a small break in its operation. After over an hour, we managed to get everything back to normal. The part that took the longest time was (who would guess?) filesystem check (after 280 days without check).

Normally starting up a machine takes a minute or two and is almost indistinguishable from a network outage or some other temporary failures. But with Wikidot having as many files that our users upload the operation of checking the filesystem takes long time.

Not even because of today's crash, I must confess, we have plans of decentralizing the service and moving it to more distributed environment to let it be (even) more reliable. Even including the crash we have still very high uptime, that would satisfy just everyone. But not us. We aim at having 100% (or more ;-) ) uptime, and make things totally fault-tolerant.

I must say we are really really sorry for what happened today but in the same time I must ensure that we really care about you — the Users — as many of you have noticed for sure. I hope you still believe in us :).


More posts on this topic

Comments

Add a New Comment
or Sign in as Wikidot user
(will not be published)
- +
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License