Earlier today, at around 08:00 Sun 17 May 2015 UTC, ridley.fastlizard4.org suffered an unexpected downtime.  At this time, the problem seems to have been caused by a hardware issue of some kind, or some other problem with the hypervisor host that ran ridley.  CPU usage increased to 800% (all 8 cores under 100% load), seemingly due to iowait, while disk activity was reduced to near-zero.  This seems to point to a disk I/O failure of some kind, which eventually caused the server to bog down so much that the CPUs “stalled” and the entire system became totally unresponsive.  The server also did not respond to a hypervisor shutdown command; eventually, the only way to bring down the system was to issue a “destroy” command that effectively “pulled the plug” on the system.

In line with the I/O problems ridley has been experiencing for a while now, I took advantage of this unscheduled downtime to also perform the waiting free Linode upgrade on ridley.  In addition to moving the system to a new hypervisor host, ridley’s RAM has now doubled and it has more bandwidth available; however, the number of vCPUs has decreased from 8 to 4.

Ridley is now back up and running, and users may now log in to restart any services or programs they may have had running.  I have checked the system, and everything appears to now be functioning normally, with all daemons and services up and running.  The LizardIRC server daemon has also been brought back up, so ridley.lizardirc.org has been relinked to the network and services are also back up and running.

Apologies for the inconvenience, and thanks for bearing with me!

About the author

Amateur radio operator (Technician-class), motorcycle rider, Wikipedia editor, computer programmer and tech, sysadmin, photographer, laser enthusiast, Minecraft addict and server operator.

Leave a Reply

Unexpected downtime post-mortem: ridley.fastlizard4.org, 17 May 2015 / LizardBlog by FastLizard4 is licensed under a Attribution-ShareAlike CC BY-SA