calum.org:~#

Death of a hard drive

Tags: linux laptop hard drive

Added: 2011-09-18T21:59:27

Death of a hard drive

A day ago, my laptop froze for about 15 seconds. When it recovered, I ran dmesg, and saw lots of lovely things like this:

[117003.557104] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[117003.557107] ata1.00: irq_stat 0x40000001
[117003.557110] ata1.00: failed command: READ DMA
[117003.557114] ata1.00: cmd c8/00:08:88:17:11/00:00:00:00:00/e9 tag 0 dma 4096 in
[117003.557115]          res 51/40:06:8a:17:11/00:00:09:00:00/e9 Emask 0x9 (media error)
[117003.557118] ata1.00: status: { DRDY ERR }
[117003.557120] ata1.00: error: { UNC }
[117003.558886] ata1.00: configured for UDMA/100
[117003.558894] sd 0:0:0:0: [sda] Unhandled sense code
[117003.558896] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[117003.558898] sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
[117003.558902] Descriptor sense data with sense descriptors (in hex):
[117003.558904]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[117003.558910]         09 11 17 8a 
[117003.558913] sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
[117003.558917] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 09 11 17 88 00 00 08 00
[117003.558924] end_request: I/O error, dev sda, sector 152115082
[117003.558943] ata1: EH complete
Now, I don't know what *all* that means, but I do know "sda" and "Unrecovered read error" in the same entry aren't generally what you want to see.

smartctl -t long /dev/sda ran, and smartctl -a /dev/sda said that while there were some errors, the drive was OK.

I'm not chancing it, and have resurrected (read restarted) my backup strategy. You do have a backup strategy too, don't you?

I'm expecting the drive to fail (no doubt at the most inconvenient time?), and have had a look at hybrid SSD drives. They seem to provide about 90% the performance of a full SSD drive, but for 25% of the cost.

Update: 2011-09-23
I got annoyed with the constant 15 second slowdowns, I ordered the Seagate 500GB Momentus XT 2.5" Hybrid SSD. Then I thought I'd have a look at what sector it was that was causing the problem, and hey presto - it was the same one all the time.
I wanted to find out what the problem file was, so I ran:
find / -type f -exec cp {} /dev/null \; >/dev/null

Eventually, it showed the problem as an I/O error on /home/calum/.mozilla/firefox/qrtsapqll.default/places.sqlite.
I backed up (why isn't it backuped?) my bookmarks, and removed the file.

Bingo, no more problems. I'll run a new smartctl long test on it, and see what it thinks.

Problem fixed (for now), and new drive coming soon.

posted by Calum on 2011-09-18T22:00 under

Add a comment

Your IP:
Please enter 4884125 here: