The Great Digital Disaster of 2010
The Beginning of a Nightmare
One week ago my hard disk failed in my laptop, which is my primary computer. It wasn’t a big deal to me at the time because I knew it was going to fail. I had noticed the signs that I know full well indicate imminent disk failure. But without a full failure I wouldn’t be able to get it warranty repaired.
So I waited. Until one week ago. Apple didn’t have the right model HDD in stock, since I had ordered the largest size available at the time. Not a big deal, I would just make due for a couple of days.
My first mistake was that I had ignored my Time Machine errors so when it failed my most recent backup was 12 days old. Not too bad. The only thing I lost was a few notes and my completed tasks.
What I didn’t know was that my Time Machine backup was corrupted. Not by a lot, and I didn’t know it at first. After doing a full system restore the only two symptoms I had were that 1) iChat would not save or remember my passwords, and 2) when attempting to view some (but not all) keychain items I would get an error message stating “Access to this item is restricted”. This drove me nuts for a couple of days because there seems to be little to no help on the Internet about this. What was even more maddening was that every other computer I have did not exhibit the problem. I tried backing up and restoring my keychain database every which way I could and nothing helped. A good keychain file on one computer was useless on my laptop. Now, I like Keychain quite a lot. But I began to lose faith in the entire system. If I couldn’t trust Keychain to keep my passwords safe then it was worthless to use as a password repository. But I still needed my applications to work. And I was beginning to worry about the loss of my 836 stored passwords.
I finally found a discussion on Apple’s mailing list with someone having the same problem and thanks to Ken McLeod an accurate description of the problem.
Codesigning
Codesigning, for the uninitiated, is a way for a software publisher to ensure that an application has not been tampered with (either by malware or bit rot).
I checked Keychain Access and sure enough, it was damaged.
$ codesign -vvv /Applications/Utilities/Keychain\ Access.app
/Applications/Utilities/Keychain Access.app: code or signature modified
It may seem odd that finding data corruption is cause for rejoice, but I began to see the light at the end of the tunnel. I copied Keychain Access from another computer and checked it again.
$ codesign -vvv /Applications/Utilities/Keychain\ Access.app
/Applications/Utilities/Keychain Access.app: valid on disk
/Applications/Utilities/Keychain Access.app: satisfies its Designated Requirement
Bingo. I once again had access to all of my keychain items but iChat still kept prompting me for my passwords. I checked iChat itself but it was valid. Now, if you don’t know, iChat is really just a front end to the iChatAgent application which does all of the real work. It’s built like this so that the menu bar status can be online while iChat itself is not running. I already did know this so that’s where I checked next and sure enough it was corrupted. Again copying from a known good source fixed the problem and iChat would log in without interaction.
The Nightmare Lands
I was in the middle of doing a system install and I had been working on this in my idle time as my coworkers racked and cabled the system. I needed to reboot to enable my serial port driver so I could configure the device.
It was at this point that my hard disk died. Again.
After the reboot, instead of seeing the Apple logo I got a flashing question mark. So close and yet so far, I was stunned. I had to finish the system configuration with a coworker’s computer. When I got back to my desk I ran Disk Utility which told me that my disk was too damaged to repair and to re-initialize the disk. Knowing there was data corruption I blamed this on Time Machine and this time did a fresh install and only restored my home directory. This took all night to get back to a usable state so I could actually work the next day. In the morning I started to notice it acting very odd. Applications were very slow to launch but would otherwise operate fine. Switching from one application to another would sometimes freeze my entire computer for about 20-40 seconds. It was chronic, and essentially unusable. Attempting repair with Disk Utility again reported that it was too far gone for fixing.
The Nightmare Closes
Another trip to the Apple Store. Another wasted day restoring my system. This time though, they did have the drive in stock, and they gave me priority in the repair queue so I had it back in about 40 minutes. After reinstalling the OS, all of my apps and restoring my data everything seems to be back in order now.