Wikimedia's Servers Declare Independence!

From LizardWiki, FastLizard4's wiki and website
Jump to: navigation, search

Following are logs from #wikimedia-tech indicating Wikimedia's servers have just failed epically, as recorded around 00:20, 5 July 2010 (UTC) (4th of July in the United States).

Logs

[17:10:31]      <nagios-wm>       PROBLEM - Disk free on lily is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:10:52]      PROBLEM - SSH on lily is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[17:11:51]      PROBLEM - Disk free on mchenry is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[17:12:21]      PROBLEM - Host srv167 is DOWN: PING CRITICAL - Packet loss = 100%
[17:12:22]      PROBLEM - Host srv163 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv155 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv154 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv166 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv164 is DOWN: PING CRITICAL - Packet loss = 100%
[17:12:31]      PROBLEM - Host srv175 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv176 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv179 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv181 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv178 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv182 is DOWN: PING CRITICAL - Packet loss = 100%
[17:12:39]      |<-- darkoneko has left irc.freenode.net:7000 (Quit: it reads 'assume good faith', not 'be stupid')
[17:12:41]      <nagios-wm>       PROBLEM - Host srv183 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv186 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv184 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv185 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:11]      PROBLEM - Host storage3 is DOWN: CRITICAL - Host Unreachable (208.80.152.169)
        PROBLEM - Host sanger is DOWN: CRITICAL - Host Unreachable (208.80.152.187)
        PROBLEM - Host mchenry is DOWN: CRITICAL - Host Unreachable (208.80.152.186)
        PROBLEM - Host sq76 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host sq77 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:21]      PROBLEM - Host srv152 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv151 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host db5 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host db7 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:22]      PROBLEM - Host db8 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:31]      PROBLEM - Host tridge is DOWN: CRITICAL - Host Unreachable (208.80.152.170)
        PROBLEM - Host hume is DOWN: CRITICAL - Host Unreachable (208.80.152.190)
        PROBLEM - Host lvs4 is DOWN: CRITICAL - Host Unreachable (208.80.152.123)
        PROBLEM - Host db9 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host rr.pmtpa is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host sq75 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:33]      PROBLEM - Host sq72 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host sq73 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:42]      PROBLEM - Host srv168 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv153 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv165 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv156 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv177 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host srv180 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:51]      PROBLEM - Host locke is DOWN: CRITICAL - Host Unreachable (208.80.152.138)
        PROBLEM - Host sq74 is DOWN: PING CRITICAL - Packet loss = 100%
        PROBLEM - Host sq71 is DOWN: PING CRITICAL - Packet loss = 100%
[17:13:55]      <dungodung>       hmm, sites seem to be down for me
[17:14:01]      <FastLizard4>     Hmm.
        <nagios-wm>       PROBLEM - Host lvs2 is DOWN: PING CRITICAL - Packet loss = 100%
[17:14:05]      <FastLizard4>     I wonder why that would be... 9_9
[17:14:10]      <dungodung>       :P
[17:14:11]      <nagios-wm>       PROBLEM - check_all_memcacheds on spence is CRITICAL: MEMCACHED CRITICAL - Can not connect to 10.0.2.183:11000 (Connection timed out)
        PROBLEM - Host upload.pmtpa is DOWN: PING CRITICAL - Packet loss = 100%
        -->| Avery_Mason (~lucid@wikipedia/Fetchcomms) has joined #wikimedia-tech
[17:14:17]      <FastLizard4>     Yay, servers go boom :D
[17:14:34]      Big expensive fireworks. :P