Mar 28 2004

Dodged a bullet last night

Published by Andrew at 8:55 PM under Uncategorized

Our server almost died last night. I was SSH’d in and noticed a few defunct sendmal process. I also noticed a few wierd looking processes I didn’t recognize, running as root.
I tried to shutdown and restart Sendmail, which only resulted in a bunch more defunct sendmail processes. At first I thought the box had been hacked. Then I noticed NOTHING would terminate. Any process I tried to kill just became defunct. I coldn’t reboot. I tried ‘reboot”, “shutdown -r now”, ‘init 6′, even sending SIGHUP, SIGUSR1 and SIGINT to init. Nothing happened, (other than the standard wall message that the system was going down for reboot with the “shutdown -r now”).
Then the contents of my home directory disappeared. I started to panic.
I called Gunilla to see if she was in the office and could go across the hall to power cycle the box. I got her just in time. She and Thomas had left the office to go home and were not 10 seconds away from driving out of cell phone range.
They turned around and headed back to got.net and while they did that, I quickly copied the entire contents of /var/log to my home machine (thank god for 3mbps download speeds).
There was nothing in /var/log/dmesg, but executing the dmesg command showed all sorts of I/O problems with /dev/hda. The wierd part is /dev/hda is our data drive. The system is on /dev/md1, a SCSI RAID 5 array.
Once Thomas rebooted the machine all was fine. My home directory reappeared and everything seems to be running on. I’m doing post-mortem diagnostics now.