This is a story about a whale. No!! This is a story about making my computer happy. A few days ago, I started having weird problems on my main workstation, srv-1. My system would lock up completely. Only by pushing the reset button could I wake her from her slumber. I reseated all of the memory and cables, at Urbana’s suggestion, and that seemed to make srv-1 happy again. Just to be safe, I did a complete rsync to srv-50:
rsync -rlptvz --delete --exclude "/share/" --exclude "/mnt" --exclude "/share2/" --exclude "/proc/" --exclude "/dev/" / /share2/srv-1/ |
srv-50 is our old dulie 300 MHz Ultra2 Enterprise Sun box. It has an external StorEdge array that we set up in this article. The rsync went well. I then did the big test:
emerge sync emerge -u world |
This is a Gentoo command that brings all of my installed software up to date with the latest version by downloading the new sources automatically and recompiling. When my Gentoo emerge got to KDE base, my hard drive started making that horrible click sound. I tried to exit my shell to reboot, and I got a bus error. Power cycling appeared to help, but a reset did not. Anyway… severe hardware issues.
I really didn’t want to reinstall. True, I could reinstall the OS and use the rsync off of srv-50, but that would take a long time over the network. One thing, though, is that I figured it was possible that I could read the data off of the hard disk. I plugged in a new 120 Gig drive into the CDROM IDE port, after setting the jumper to Master. IDE drives behave better if they are each their own master on their own separate channel. I then set the BIOS DMA stuff to disabled. Anything that looked like performance related stuff in the BIOS for the hard drive, I disabled. The theory was that if I just took it easy on the drive, I might be able to boot and read the data. I created matching partitions, and created a filesystem on the new drive. I mounted it on /mnt. From /, I used this command to migrate my damaged 40 gig drive to the new 120:
tar -cvpf /mnt/srv-1bak.tar --exclude=proc --exclude=mnt . |
This can also be done with dump. See this article. Note that I did not use the z option to compress. I wanted to do this quickly. Also note that I forgot to exlude share, which I excluded in the rsync above. This is an NFS filesystem, exported from our big fileserver, Mondo. We have this set up with a hard mount, which means I can’t simply disconnect the network cable to stop files from being copied. I ran over to the fileserver, pulled the network cable out, went into single user mode with init 0, unmounted share, which was on a RAID device, created a directory called /share2, remounted the RAID device on /share2, plugged in the network cable again, and went back to init 3. This made an empty NFS share export. The tar command running on srv-1 continued merrily along.
Restoring was easy. I went to /mnt, and just untarred the tarball:
tar -xvpf srv-1bak.tar |
While it was untarring, I copied the files from boot over to the new boot partition:
mkfs.ext2 /dev/hdc1 cd /boot mkdir /newboot mount -t ext2 /dev/hdc1 /newboot cp -apr * /newboot/ |
I also ran mkswap against my swap partition on the new drive:
mkswap /dev/hdc2 |
After the tar finished, I just booted from my rescue floppy, much like the one I created in this article. I needed to get lilo set up correctly, and this is easier to do if you use a boot floppy and run lilo with the drives set up the way you want them. I haven’t had much luck getting the MBR correctly set up when I’m booted off of the old system. OK. Did you notice a step missing? I forgot to recreate proc, and this kept my new system from booting correctly. Proc is dynamically generated, but it needs the mount point. I excluded proc and mnt from my tar command. Nothing that my trusty Super Rescue CD couldn’t take care of. I just booted off of the CD, mounted /dev/hda3, and created a proc directory.
I was now able to boot my system just fine, but I had a problem starting X. I got an error “Failed to initialize the Nvidia kernel module”. I tried to recompile the Nvidia drivers, but GCC didn’t match the version my kernel was compiled with, since I had just upgraded it. I decided to upgrade the kernel to 2.4.26, as long as I had to recompile anyway. This turned out to be a fateful decision. I tried the drivers from Nvidia, but this didn’t help. I found that Gentoo had a package for these, so I installed the Gentoo version:
emerge nvidia-glx nvidia-kernel |
I still had an error that /dev/nvidiactl didn’t exist. I recreated these:
mknod /dev/nvidia0 c 195 0 mknod /dev/nvidiactl c 195 255 |
I figured that my /dev directory must not have come over completely clean, so I also tested my camera. (For more info on how to configure a camera that has a USB filesystem access, see this article). My camera didn’t work, so I created some devices:
root@srv-1 dev # mknod sdb b 8 16 root@srv-1 dev # mknod sdb1 b 8 17 root@srv-1 dev # mknod sdb2 b 8 18 root@srv-1 dev # mknod sdb3 b 8 19 root@srv-1 dev # mknod sdb4 b 8 20 root@srv-1 dev # mknod sdb5 b 8 21 root@srv-1 dev # mknod sdb6 b 8 22 root@srv-1 dev # mknod sdb7 b 8 23 root@srv-1 dev # mknod sdb8 b 8 24 |
I also created some USB devices for my Palm device:
root@srv-1 dev # mknod /dev/ttyUSB0 c 188 0 root@srv-1 dev # mknod /dev/ttyUSB1 c 188 1 root@srv-1 dev # mknod /dev/ttyUSB2 c 188 2 root@srv-1 dev # mknod /dev/ttyUSB3 c 188 3 |
I finally ended up figuring out that I was using devfs… doh!… and when I moved over to 2.4.26, the option DEVFS_MOUNT in the kernel was set to n. I changed it to y, and my devices were a lot more manageable. 🙂 The *last* time I configured devices was for the GIAGD systems, so I’m used to using mknod a lot. I figure what happened is that tar grabbed the devices that were currently created by devfs, and when I upgraded the kernel I missed that setting somehow. There is a good FAQ about devfs here. There is another good article by the architect of Gentoo here. Sigh. It turns out that in the 2.6 kernel, even devfs is old. The new way is with udev. More info on these changes here, and for a really good lowdown on the cause of this change, see this article. Regardless, my system is all happy now.
Related Post: Best Disk Imaging Software